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Abstract 


The Reliable Optical Bus (ROBUS) is the core communication system 
of the Scalable Processor-Independent Design for Enhanced Reliability 
(SPIDER), a general-purpose fault-tolerant integrated modular 
architecture currently under development at NASA Langley Research 
Center. The ROBUS is a time-division multiple access (TDMA) 
broadcast communication system with medium access control by means 
of time-indexed communication schedule. ROBUS-2 is a developmental 
version of the ROBUS providing guaranteed fault-tolerant sendees to the 
attached processing elements (PEs), in the presence of a bounded 
number of faults. These sendees include message broadcast ( Byzantine 
Agreement ), dynamic communication schedule update, clock 

synchronization, and distributed diagnosis (group membership). The 
ROBUS also features fault-tolerant startup and restart capabilities. 
ROBUS-2 is tolerant to interned as well as PE faults, and incorporates a 
dynamic self-reconfiguration capability driven by the interned diagnostic 
system. This version of the ROBUS is intended for laboratory 
experimentation and demonstrations of the capability to reintegrate 
fedled nodes, dynamically update the communication schedule, and 
tolerate and recover from correlated transient faults. 
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1. Introduction 


The Scalable Processor-Independent Design for Enhanced Reliability (SPIDER) is a general-purpose 
distributed computer architecture currently under development at NASA Langley Research Center. The 
purpose of this effort is to design a flexible architecture that can be configured to satisfy a wide range of 
performance and reliability requirements, while preserving a consistent interface to application programs. 
One of the development goals is to develop the architecture such that it efficiently scales from a small 
configuration supporting a single aircraft function to a large distributed configuration performing multiple 
functions simultaneously. The architecture is expected to support functions of various criticality levels, 
including ultra-reliable and safety-critical aircraft functions with hard real-time deadlines. 

SPIDER is designed as an integrated modular architecture (IMA) composed of a communication 
system and a set of processing elements (PEs). The Reliable Optical Bus (ROBUS) is a fault-tolerant, 
time -division multiple access (TDMA) broadcast communication system with medium access control by 
means of a time -indexed communication schedule. The ROBUS provides a set of basic communication 
services, and its essential goal is to ensure reliable communication between all pairs of fault-free PEs. 
The PEs perform two basic functions: execute the application software and run the distributed operating 
system (SPIDER-OS). The application-specific software executed by individual PEs may include 
processing of data, computing control functions, reading sensors, driving actuators, or providing a 
communication path to other networks (e.g., a gateway function). The SPIDER-OS handles the 
communication, process management, and redundancy management at the PE level. The SPIDER-OS 
consists of a commercial off-the-shelf (COTS) real-time operating system (RTOS) and a middleware 
layer located between the operating system and the application software. The SPIDER middleware 
provides an interface between applications running on the PEs and handles all the SPIDER-specific 
functions that are not a concern of application-specific software. The middleware enables the 
implementation of fault-tolerant strategies combining the PEs to provide fail-operational and fail stop 
capabilities in a way that is transparent to the application software. The redundancy management 
strategies at the PE level are flexible and can be adapted to support dissimilar processors. 

The ROBUS is the central feature of SPIDER in the sense that it provides a set of basic services and 
guarantees upon which higher-level services are built. The approach selected for the development of 
SPIDER includes the design and implementation of concept demonstration versions of the ROBUS. 
Although it has fairly straightforward behavior at the external interfaces, internally the ROBUS is in fact 
a distributed system consisting of dedicated protocol processors that perform ROBUS-specific functions 
and are interconnected by a lower-level communication network. The developmental versions of the 
ROBUS will be leveraged in laboratory investigations to assess the effectiveness of the distributed 
protocols and the redundancy management strategies and to expose areas where further research and 
development is required. These demonstration versions of the ROBUS will also be used as test beds for 
the development of the SPIDER OS. 

This document provides a description of ROBUS-2, an instance of the ROBUS designed to 
demonstrate the following bus capabilities: re -integration of repaired nodes, dynamic update of the 
communication schedule, and fault-tolerance and recovery from correlated transient faults. This instance 
of the ROBUS also serves as a design case for the study of robustness and efficiency in implementations 
of the error detection, diagnosis, and reconfiguration strategies developed up to this point. In addition, 
ROBUS-2 is intended to demonstrate that the bus can achieve a PE-message throughput that approaches 
the available bandwidth at the physical communication layer, while preserving the fault-tolerance 
guarantees. 
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The first version of the bus, ROBUS-1, is described in [Miner 02], The design of ROBUS-2 is based 
on the unified fault -tolerance protocol discovered by Miner, et al [Miner 04]. That protocol is a 
generalization and extension of the Byzantine fault-tolerance protocol introduced by Davies and Wakerly 
[Davies 78]. 

[Rushby 03] presents a comparison of bus architectures for safety-critical applications, including 
SAFEbus, TTA, FlexRay, and SPIDER. 


1.1. Basic services 

ROBUS-2 provides four basic fault-tolerant services. 

• Message broadcast: Every scheduled message sent by a PE is delivered to all of the properly 
working PEs. Irrespective of the status of the source PE, all of the properly working PEs will agree 
on the content of each message. If the source PE is working properly, all of its messages will be 
received exactly as they are sent. 

• Communication schedule update: The PEs can dynamically modify the bus access pattern by 
downloading a new communication schedule to the ROBUS. 

• Time reference: ROBUS-2 provides an accurate and precise time reference to the PEs, which they 
can use to coordinate their actions. 

• Self-diagnosis: ROBUS-2 can detect and diagnose internal failures with a high degree of coverage. 
Diagnosed component failures are periodically reported to the PEs so they can react appropriately 
according to their application. 


1.2. Additional features 

Other features of ROBUS-2 include the following. 

• Time-triggered operation: Normal activity on the bus is controlled by time-indexed internal 
operation schedules that specify exactly when to begin the processing for each service and, for most 
protocols, exactly when to start all the transmissions. In addition, a highly effective fault-tolerant 
time synchronization protocol enables the bus to measure time with fine resolution. These are critical 
elements that give the bus the ability to deliver services with predictable timing, even in the presence 
of faults. 

• Communication schedule enforcement: ROBUS-2 grants access to the bus only as indicated by the 
communication schedule. The enforcement mechanism ensures that faulty PEs do not interfere with 
other PEs accessing the bus. 

• Self-reconfiguration: Internal error detection and diagnosis allows ROBUS-2 to quickly identify and 
neutralize failed internal components. These mechanisms also allow the bus to re -integrate repaired 
components. 

• Internal-fault masking: ROBUS-2 incorporates a fault-masking capability that allows it to tolerate a 
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bounded number of undiagnosed internal component failures. 

• Fault-tolerant startup and restart: The error handling mechanisms are active during initialization. 
This enables the bus to start up with variable initial configurations and in the presence of component 
failures. In addition, the error handling mechanisms enable ROBUS-2 to detect many transient errors 
and take appropriate actions to clear and re-integrate the affected components. These mechanisms 
coupled with the startup capability give ROBUS-2 the means to recover from some scenarios of 
massive transient faults affecting the system. 

• PE-fault tolerance: ROBUS-2 design allows it to maintain internal coordination and continue service 
delivery independently of the number of failed PEs. Error detection applied to the communication 
schedule updates enables the detection of invalid schedules, in which case ROBUS-2 activates a 
default schedule to ensure that the PEs can continue to communicate. 

This version of the ROBUS is intended for implementations with a relatively small number of PEs, 
say fewer than seven. Future versions will include various design optimizations to enable efficient 
implementations with a much larger number of PEs. 


1.3. Document organization 

This document is intended to be a comprehensive and self-contained design reference including 
description and analysis. The following sections describe the design of ROBUS-2 in detail. The 
presentation begins with an overview of the behavior and structure of the bus. This is followed by a 
description of the message format and the distributed coordination strategy for the implementation of the 
ROBUS-2 protocols. The diagnostic system, including the diagnostic policy, is described. Then, the 
modes of operation of the bus are presented, including descriptions for each of the protocols. The 
appendices present relevant background concepts and the basic theory of fault tolerance and 
communication, as well as analysis for the ROBUS-2 protocols and the startup and restart capability. 
Throughout, the document provides insight into the operation of the design, including how to set up 
critical aspects of the system for an actual physical implementation. 

From this point on, we refer to the bus described here simply as “ROBUS”. It should be understood 
everywhere, unless explicitly stated otherwise, that we are referring to the ROBUS-2 version of the bus, 
and not about ROBUS in general. 
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2. System overview 

The following introduces the design of the ROBUS and serves as an overall reference for later 
sections, which cover particular design elements in detail. 


2.1. System behavior 

This section presents a brief overview of the behavior of the ROBUS. 


2.1.1. Basic states 

Figure 2.1 shows a simplified view of the high-level state transitions. The bus is deactivated by 
cutting off power to the system. When enabled, it executes an initialization routine and then proceeds to 
begin service delivery. The bus will remain engaged until it is deactivated or a failure condition is 
detected. If a failure occurs, the bus will try to re-establish service delivery as soon as possible. For 
ROBUS-2, all bus failures are presumed to be transient. Thus, the bus is designed to never give up trying 
to return to normal operation. 



Figure 2.1: Simplified high-level state-transition graph for ROBUS 


2.1.2. Steady-state operation 

The steady-state behavior of the ROBUS consists of a simple cyclic operation. As illustrated in Figure 

2.2, in each cycle the bus goes through a predetermined sequence of protocols to deliver the expected 
services: time reference, self-diagnosis, communication schedule update, and PE message broadcast. 
Note that Figure 2.2 is not drawn to scale. Most of the time in a cycle (say, over 90%) is available for the 
broadcast service. 
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Figure 2.2: Service delivery sequence 

The Time Reference service provides a periodic time update in the form of a dedicated message 
simultaneously broadcast from the bus to the PEs. The period between updates, called the re- 
synchronization period, is nominally specified before run time. The time reference indicates the time 
kept by the bus, which is not synchronized to an external time source. (The PEs can maintain dedicated 
time clocks synchronized to an external time reference independently from the ROB US time service. 
Those clocks would be updated periodically with adjustments agreed to by the PEs using an agreement 
protocol and communication via ROBUS.) 

During Self-Diagnosis, the bus sends out to the PEs the latest available results of internal diagnosis. 
The interval from the end of one self-diagnosis to the end of the next is called a diagnostic cycle. The 
protocol used for this service ensures that the PEs receive consistent diagnostic information. This 
information can be used by the PEs for process and redundancy management decisions at the SPIDER 
level. 

During Schedule Update, all the PEs simultaneously send their desired schedule to the bus. The 
schedule specifies the number of messages that will be transmitted by each PE during the next broadcast 
service. Ideally, all the PEs agree on the communication schedule before it is sent to the ROBUS. 
However, the ROBUS is designed to tolerate a condition in which there is no agreement among the PEs. 
This is accomplished by using error detection and an agreement generation protocol. If the ROBUS 
detects that the received schedule is invalid, it will reject it and a default schedule will be used. The final 
decision on the schedule to be used is forwarded back to the PEs. 

In PE Broadcast, the ROBUS grants bus access to individual PEs according to the communication 
schedule. An interactive consistency protocol is used for each scheduled message to ensure that the PEs 
receive consistent messages. The bus access pattern is a time-indexed, as-soon-as-possible (ASAP) 
round-robin sequence. Figure 2.3 provides an example of the access pattern. The PEs are identified 
according to the statically assigned identification numbers which uniquely identify each ROBUS port. 
The PEs access the bus in ascending order according to the port identification numbers. The first 
scheduled message is sent at some predetermined time. The interval between the send time of one 
message and the send time of the next (known as the data introduction interval or DII) [De Micheli 94] 
is constant. After all the scheduled messages for one PE have been sent, the messages for the next PE are 
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broadcast maintaining the DII between messages. If one PE is not scheduled to send messages, then the 
messages for the next scheduled PE are sent. After all of the scheduled messages are processed, the bus 
remains idle until the time to restart the Time Reference service. 



PE 1 

PE 2 

PE 4 

PE 6 

PE 7 








W 


Time 

Figure 2.3: Example of an access pattern during the PE Broadcast service 


2.2. System structure 

Figure 2.4 shows the ROBUS topology. The bus has an active star architecture with the Bus 
Interface Units (BIUs) serving as the bus access ports and the Redundancy Management Units 
(RMUs) providing connectivity as network hubs. The network between BIUs and RMUs forms a 
complete bipartite graph in which each node is directly connected to every node of the opposite kind. 
Only the links shown are available for communication. There are no functional links between BIUs or 
between RMUs, and the RMUs have no direct links to the external world. All of the communication links 
are bidirectional. The design of the ROBUS is independent of the physical point-to-point communication 
technology and is suitable for use with point-to-point optical data links. 



Figure 2.4: ROBUS topology 

The number of BIUs, denoted by N, is fixed. The number of RMUs is denoted by M and is also fixed. 
Every BIU is assigned a unique node identification number from 1 to N. Likewise, the RMUs are 
assigned numbers from 1 and M. Each PE is uniquely identified by the number of its corresponding BIU. 

Using Figure 2.4 it is easy to see how the communication schedule can be enforced. Since the PEs are 
connected to the bus via the BIUs, it is the responsibility of each BIU to ensure that the messages from its 
attached PE are forwarded to the RMUs only at allowed times. Similarly, since the BIUs are attached 
directly to the RMUs, the RMUs are responsible for ensuring that only the messages from the scheduled 
BIU (and its corresponding PE) are relayed back to the BIUs. The most important aspect of the bus- 
access enforcement mechanism is to control access the RMU-to-BIU links. 
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2.3. Node behavior 


Figure 2.5 presents a simplified view of the high-level state-transition graph for the ROBUS nodes. 
BIUs and RMUs. Both BIUs and RMUs follow this same pattern of behavior. This graph is essentially 
the same as the one for the ROBUS shown in Figure 2.1. In the Disabled state, a node is powered off or 
otherwise removed from active bus participation. Once enabled, a node enters the Initializing state where 
it tries to find other nodes suitable for providing communication services to the PEs. Once a node has 
confirmed that it is operating in a proper configuration with other nodes, it enters the Engaged state. To 
deliver services to the PEs, it is necessary for a group of BIUs and RMUs to work together in a 
coordinated way. We refer to a group of BIUs and RMUs that can be relied upon to deliver proper 
services to the PEs as a clique. An initializing node becomes engaged after it identifies a clique and 
becomes part of it. If a node determines that a significant failure condition is present while being part of 
clique, the node transitions back to the Initializing state to reset its state and attempt to re-engage. A 
ROBUS node can be designed with the capability to transition to the Disabled state when it determines 
that it cannot form or join a clique due to local permanent faults or some condition that is outside the 
recovery capabilities and is interpreted as a permanent failure. That feature, illustrated by the dashed 
arrow, is not included in ROBUS-2. 



Figure 2.5: Simplified high-level state-transition graph for BIUs and RMUs 


2.4. Node structure 

Figure 2.6 depicts the basic structural components of a ROBUS node. This decomposition applies to 
BIUs and RMUs. The Communication Module handles all the point-to-point communication and uses 
mostly commercial off-the-shelf (COTS) components. The links between BIUs and RMUs implement 
broadcast communication using either one-to-one or one-to-many links. If the BIUs and the PEs are 
physically separate (see the topic Fault containment in a later section), the interconnection between them 
must use one-to-one links. If they are not separate, then some other means for local data transfer can be 
used. 

The Computation Module, also known as a ROBUS Protocol Processor (RPP), handles all the 
ROBUS-specific functions including mode transition logic, low-level protocols, error detection, 
diagnosis, reconfiguration, and distributed coordination. 






Figure 2.6: Generic node structure for BIUs and RMUs 


2.5. Distributed coordination 

Each ROBUS node is driven by an independent, free -running physical oscillator. These oscillators are 
characterized by a known bound on their drift rate with respect to real time. Each node also has a logical- 
time clock, referred to as the local-time clock, which keeps track of the passage of time as indicated by 
the physical oscillator. Given an initial precision of synchronization for the local times at any two nodes, 
the precision can worsen over time at a rate determined by the drift rate of the physical oscillators. 

The ROBUS protocols are divided into two categories: synchronization protocols and synchronous 
protocols. The synchronization protocols use event-triggered communication and event-processing 
operations to generate high-precision distributed events that are used to synchronize the local-time clocks. 
The synchronous protocols use time -triggered communication and operations in order to process 
information. To achieve proper coordinated action in the execution of the synchronous protocols, the 
local-time clocks of the participating nodes must be synchronized within some known bounded precision. 

The ROBUS has two synchronization states: synchronized and unsynchronized. In the synchronized 
state, the precision of synchronization is determined by an internal distributed reference event generated 
by a clock synchronization protocol. The precision of this event allows the nodes to achieve very tight 
local-time synchronization. The bus is in the unsynchronized state when it transitions to the startup and 
restart processes. The precision of synchronization in this state is mainly determined by events not 
directly controlled by the bus. It is assumed that the synchronization precision in this mode has a known 
bound that can be large relative to the precision in the synchronized state. The bus transitions from the 
unsynchronized state to the synchronized state after the execution of a synchronization initialization 
protocol. Because the local times can drift apart, the synchronization protocol must be re-executed at 
regular intervals to ensure that the local times are kept synchronized. The rate of re-synchronization is 
constrained by physical parameters of the design (e.g., oscillator drift rates) as well as precision and 
accuracy goals. The fault-tolerance attribute of the synchronization protocol enables the bus to maintain 
synchronization even in the presence of failed nodes. 

Startup and restart of the bus are particularly difficult scenarios to handle properly, especially in the 
presence of arbitrary faults. The ROBUS achieves synchronization during startup and restart by 
exploiting the properties of the initial synchronization protocol. With this protocol, the ROBUS can 
synchronize if the nodes start within a known bound of the relative local-time skew. The critical property 
concerning this capability of the synchronization protocol is that, although the initial relative skews must 
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have a known bound, this bound can be arbitrarily large. This feature enables the use of physical events 
beyond the sphere of control of the nodes as distributed reference events to coordinate the startup and 
restart processes. The local power-on enable, which is externally controlled by the system user, is used 
by the bus as a reference event for startup. The detection of a bus failure, which is triggered by some 
fault-causing phenomenon, is used as a reference event for restart. The worst-case precision of these 
events determines the bound on the initial relative local-time skew in the unsynchronized state. 

The execution of synchronous protocols is driven by the local time and a time-indexed operation 
schedule. The low-level distributed protocols specify the node activities by defining the operations, the 
operation sequencing, the message flow patterns, and the executing nodes for each operation. The timing 
of the operations is determined using a model of distributed synchronous composition. This execution 
scheme and the high synchronization precision in the synchronous state make the steady-state behavior of 
the ROBUS highly deterministic as it precisely specifies the timing of all the internal communication 
between BIUs and RMUs, as well as the communication with the PEs. The concept of distributed 
synchronous composition is explained in detail in Section 3. 


2.6. Redundancy management 

The purpose of redundancy management is to increase the probability of continued service delivery 
through effective utilization of available resources. The ROBUS is designed to manage its redundant BIU 
and RMU components independent of the PEs. 


2.6.1. Fault containment 

Fault containment refers to the confinement of physical faults to a limited locality. This is achieved by 
establishing containment boundaries defined by fault propagation barriers that prevent faults from 
spreading indiscriminately throughout the system. Each area enclosed by containment boundaries is 
known as a fault containment region (FCR) (see [Lala 91]). Ideally, the FCRs are independent from 
each other in the sense that physical faults in one FCR will not cause faults in others. Communication 
between FCRs is through carefully specified interfaces that ensure a sufficiently high degree of fault 
containment. Fault containment is a fundamental requirement of most fault-tolerant systems. In [Driscoll 
03] , Driscoll, et al present a particularly devious fault propagation mechanism that can wreak havoc in a 
system if not properly addressed in the design of the FCRs. For ROBUS, every BIU and RMU node is in 
a separate FCR. Each BIU can be by itself in a FCR, or it can share an FCR with its attached PE. 

Although FCRs can prevent the propagation of faults, they do not preclude the simultaneous presence 
of physical faults in separate FCRs caused by independent phenomena internal to the system. In addition, 
external threats like lightning and high-intensity radiated fields (HIRF) have the potential to disturb 
multiple FCRs. It is presumed that the fault-containment solution does not prevent the propagation of 
environment-induced faults within a FCR. Therefore, when a fault is detected in a FCR, all the 
components within the FCR are presumed to be affected and no specific assumptions are made about the 
behavior of the corresponding ROBUS node. The ROBUS is designed with mechanisms that can handle 
a large number of coincident faults and arbitrary fault manifestations. 


2.6.2. Error detection 

Error detection is based on the comparison of actual attributes of observed data against expected 
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attributes. The ROBUS nodes use six categories of error detectors. These checks generate syndromes 

that are used to diagnose the system. 

• Communication checks: Each communication link should have a high-coverage error-detection 
capability for errors occurring anywhere from the transmitter to the receiver. 

• In-line checks: These checks individually compare received messages against expected 
characteristics of timing and content. 

• Cross-lane checks: These checks compare received messages against the result of a vote. The 
checks are performed on timing and content characteristics. 

• Protocol checks: These checks are essentially sanity checks on intermediate and/or final protocol 
results based on expected behavioral characteristics of the ROBUS. 

• Self-checks: These checks are performed by a node to monitor its own operation. The self-checks 
described in this document are based on properties of the ROBUS protocols. Other protocol- 
independent or application-specific checks can be defined to increase the error coverage. 

• PE-error checks: These checks are not specified in this document. However, the system is designed 
to accept and process error syndromes about expected PE messages at the BIUs. 


2.6.3. Diagnosis 

Each BIU and RMU node is an observer of every node. An observed node is known as a defendant. 
A direct observer receives information from the defendant by way of a direct data communication link. 
An indirect observer receives information from the defendant by way of direct observers. Due to the 
ROBUS topology, a node is a direct observer of nodes of the opposite kind and an indirect observer of 
nodes of the same kind, including itself. Every ROBUS node is a defendant and an observer. The 
purpose of diagnosis is to assess the status of each node and the bus as a whole. The diagnostic system of 
the ROBUS is a distributed system divided into two layers. In the local layer, the nodes monitor the 
communication and independently diagnose each node and the bus. In the collective layer, the nodes 
exchange diagnostic information to augment their local diagnoses. 

The diagnostic system assesses each node to determine its suitability to participate in the delivery of 
services to the PEs. A trustworthy node can be relied upon to deliver the expected services. 
Untrustworthy nodes do not behave as expected and, thus, are sources of errors. The causes of errors by 
a node can be physical defects or disturbances, or incorrect values held in the state variables. 

There are three steps to diagnose a node: error detection, culprit identification, and assessment. The 
error checks of the types described in the previous section are used to generate error syndromes. Error 
sources are identified using the error syndromes and knowledge of the protocols and the topology. Some 
error syndromes unequivocally point to a single error source, while others are ambiguous and require the 
combination of multiple syndromes in order to locate the error source. The diagnostic system uses a local 
hierarchical classification scheme and policy-based rules to assess the status of each node. Each step in 
the hierarchy corresponds to an increase in the severity of the assessment. A node is suspected by an 
observer when it determines that the defendant is one of several possible culprits for a detected error. A 
node is blamed when an observer determines that the defendant is a source of detected errors. A node is 
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accused by an observer when it determines that the defendant is untrustworthy, but is uncertain whether 
other observers have reached the same conclusion. A node is convicted when the observers agree that a 
sufficient number of them consider the defendant untrustworthy. For this version of the ROBUS, each 
node uses Boolean variables for all the diagnostic information. 

The BIU and RMU members of a clique work together in a coordinated way to deliver services to the 
PEs. A clique is considered trustworthy if it is suitable to deliver services according to the specification. 
The diagnosis of the bus consists of determining if a trustworthy clique is in operation. For this version 
of the ROBUS, it is assumed that at any time there is at most one stable trustworthy clique on the bus. 
The diagnostic system uses error syndromes, knowledge of the protocols, the results of node diagnostics, 
and policy-based rules to assess the status of the bus. 


2.6.4. Reconfiguration 

The puipose of reconfiguration is to enhance the ability of a clique to establish and preserve proper 
service delivery in the presence of untrustworthy nodes. The membership of a clique is determined using 
the results of diagnosis. A clique is reconfigured by adding or removing nodes from its membership. A 
member of a clique is allowed to participate in the delivery of services to the PEs and is referred to as a 
trusted node. We refer to a node searching for or trying to become part of a clique as a recovering node. 

The reconfiguration strategy of the ROBUS is driven by the need to handle scenarios with a large 
number of simultaneous or nearly simultaneous node failures caused by harsh environmental phenomena. 
Although the ROBUS has the capability to re-initialize a failed clique, the preferred way to handle a fast 
increase in the number of untrustworthy nodes is to preserve the delivery of services by quickly removing 
as many untrustworthy nodes as possible. The presence of a surviving clique forces recovering nodes to 
execute a re -integration procedure to rejoin the clique. The re -integration procedure of the ROBUS is 
considered more robust than the re-initialization procedure, which has strict assumptions about the 
duration of the fault-causing phenomenon and the failure detection delay. In addition, the coordinated 
and highly deterministic activity of a clique engaged in service delivery to the PEs enables the application 
of detailed error detection and diagnosis by the recovering nodes and the clique. This allows the 
expansion of the clique to proceed with a high level of protection against untrustworthy nodes. Another 
advantage of preserving a degraded clique is that it increases the likelihood that at least some PEs can 
continue to do useful work. 


2.6.5. Error containment 

Error containment refers to the establishment of barriers to prevent incorrect information from 
propagating throughout the system. The error propagation barriers define partitions called error 
containment regions (ECRs). Similarly to the FCRs, every BIU and RMU is in a separate ECR. Also, 
each BIU can be by itself in an ECR, or it can share an ECR with its attached PE. 

The only error propagation path between ECRs is through their interfaces. Thus, error containment 
can be achieved by placing barriers at one or both ends of each interface. The effectiveness of an error 
propagation barrier, referred to as the error-containment coverage, is measured by the probability that 
errors will not propagate across the barrier. For the interfaces between BIUs and RMUs, error 
containment is realized by a fail-stop mechanism to block errors at the source end of an interface, and 
input error detection and voting to block errors at the receive end. The use of error propagation barriers 
between BIUs and PEs is optional and their definition is not part of this document. 
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2. 6.5.1. Fail-stop nodes 

The goal of fail-stop behavior is to increase the error-containment coverage of an interface. Errors at a 
source node can affect output transmissions in unknown ways. Fail-stop behavior prevents the 
indiscriminate propagation of errors out from an ECR by mapping detect failures to a condition of no 
output activity, which can be consistently identified by the nodes at the receiving end as an indication of 
an untrustworthy source. 

The ROBUS nodes disable their output ports as soon as a local failure or a bus failure is detected. 
These conditions are indications that a node should not continue with normal activity because its 
transmissions are likely to be erroneous or the receiving nodes are not operating properly. 

The fail-stop reaction of the ROBUS nodes is not permanent. As mentioned in a previous section, the 
nodes in this version of the ROBUS do not implement a transition to a disabled state. Instead, following a 
failure, the nodes always tty to recover and re-enable their outputs as required by the recovery procedures. 


2. 6.5.2. Input error detection 

Input error detection prevents errors from entering an ECR. The location of detectors at the receiving 
end of an interface allows them to provide coverage for errors originating at the transmission source or 
somewhere in the communication path from the source to the receiver. In the ROBUS, input error 
detection is realized by the communication and in-line checks. 


2. 6.5. 3. Dynamic voting 

Most ROBUS operations involve redundant sources and voting performed at the receivers to reduce 
the information to a single result. As for the case of input error detection, voting at the receiving end of 
an interface provides protection against errors originating at a transmission source or in a communication 
path. The voting operations used by the ROBUS fall under the general category of dynamic voting, in 
which only a selected group of inputs is considered in the voting operation. The sources whose inputs are 
allowed to participate are called the eligible voters. The selection of eligible voters is based on the 
available results of node diagnosis and error detection performed on the inputs. Dynamic voting enables 
the ROBUS to quickly apply diagnostic results in order to enhance error containment and is the 
foundation of the internal-fault-masking feature of the bus. Three types of voting operations are defined 
for this version of the ROBUS: middle-value-select event voting, exact-match majority word voting, and 
exact-match majority bit voting. 

Middle-value-select event voting is the basic operation used by the synchronization protocols to 
process timing events. In these protocols, the voting function, referred to as the Accept function, 
produces an output a fixed delay after it receives the middle event from the eligible voters. Let E denote 
the number of eligible voters. The middle event is defined as event number |~(E + l)/2]. Equivalently, the 
middle event is the first event after |_E/2j events have been received. 

The unit of data for exact-match majority word voting is the multi-bit word. For this operation, 
referred to as a word vote, there is an exact-match majority among the input eligible voters if at least [ (E 
+ 1)/2~1 of the input words are exactly equal. Two eligible inputs are equal if they are an exact match in a 
bit-by-bit comparison. If there is a majority, the result of the vote is equal to the majority word. 
Otherwise, the result is undetermined and a no-majority condition is asserted. 
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The unit of data for exact-match majority bit voting is the bit. This operation, called a bit vote, is used 
for processing Boolean diagnostic variables like suspicions, accusations, and convictions. For this 
function there is an exact-match majority if at least [~(E + l)/2~] of the eligible input bits are equal. If a 
majority of the eligible inputs are FALSE, the result is FALSE. Otherwise, the result is TRUE. This 
function definition is biased against the defendant. (This bias is justified by the analysis in Appendix F.) 


2.7. Major operational modes 

Figure 2.7 presents the mode transition graph for the ROBUS nodes. This graph applies to BIUs and 
RMUs. After a power-on enable, a node goes to the Self-Test major mode to perform a local 
initialization and test its circuitry. The node will remain in this mode indefinitely unless it successfully 
passes the test. After exiting this mode, the node proceeds to determine the status of the bus. 



Figure 2.7: Major operational mode transitions for ROBUS nodes 

The Clique Detection major mode consists of three minor modes. In Local Diagnosis Acquisition, a 
node uses unsynchronized local observations to make a first assessment of the likely members of a clique. 
In Synchronization Acquisition, the node attempts to synchronize to the clique. In Collective Diagnosis 


14 








Acquisition, the node captures the health assessment for each node as determined by the clique during the 
execution of the distributed diagnosis protocol. If at any time during the Clique Detection mode the node 
determines that no clique is present, it will exit this mode and attempt to form a new clique. Otherwise, it 
will assume that a clique exists and will try to join it. 

A node transitions to the Clique Initialization major mode to form a new clique. The first minor 
mode is Initial Diagnosis, in which a node identifies other nodes also attempting to form a new clique. 
This is followed by the Initial Synchronization and Collective Diagnosis minor modes, where the nodes 
are synchronized and a consistent clique membership is established. 

When a node enters the Clique Join mode, its state is in agreement with the state of the clique. In this 
mode, the node runs for two diagnostic cycles, essentially trying to demonstrate that it can be trusted. 
The existing members of the clique will integrate the node as soon as they confirm that the admission 
rules have been satisfied. 

In the Clique Preservation major mode, a clique delivers services to the PEs according to the 
operation schedule. In the Schedule Update minor mode, a schedule-download protocol is executed to 
allow the PEs to reprogram the bus according to their communication needs. During PE 
Communication, first the PE messages are broadcast according to the communication schedule, and then 
the BIUs and RMUs exchange accumulated accusations against nodes of the opposite kind, which serves 
to enhance the diagnosis and reconfiguration capabilities of the bus. This is followed by a re- 
synchronization of the local time in the Synchronization Preservation mode and then a reassessment of 
the clique membership in the Collective Diagnosis mode. 


2.8. Startup and restart 

The ROBUS has a flexible capability to set up a clique and change its membership using the 
reconfiguration mechanisms. These mechanisms do not have restrictions on the number of nodes that can 
be simultaneously removed or admitted to a clique. As long as the clique membership is not overrun by 
untrustworthy nodes, the trustworthy nodes will be able to continue service delivery. 

To start up a disabled bus, a group of BIUs and RMUs must be enabled within a known bounded time 
interval. Since there is no clique present, the nodes will reach the Clique Initialization mode and then 
transition to the Clique Preservation mode. The size of this initial clique can range anywhere from one 
BIU and one RMU to all BIUs and RMUs. Subsequently enabled nodes, if there are any, will detect the 
existing clique and follow the Clique Join path to be integrated into the clique. 

A node determines that a local failure has occurred when its self-check detectors are triggered or when 
it is removed from the membership of a clique. In this case, the node transitions to the Self-Test mode, 
and then it attempts to re-integrate into the clique. 

The nodes detect a clique failure when not enough BIUs and RMUs are trusted, and when the results 
of collective operations do not satisfy expected characteristics. It is possible for a clique in steady-state 
operation to recover from massive coincident transient faults that overwhelm its degradation and fault- 
tolerance capabilities. The re -initialization scheme assumes that the worst-case duration of a transient 
fault-causing phenomenon and the delay to detect the bus failure can be bounded. This is used to 
determine a bound on the initial relative local-time skew when entering the Clique Initialization mode. 
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Although highly unlikely, it is theoretically possible for coincident transient faults to corrupt the 
system in such a way that the nodes are divided into multiple mutually exclusive cliques simultaneously 
operating on the bus. In general, the ROBUS does not have the capability to recover from such 
conditions. 
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3. Communication and distributed coordination 


This section describes the mechanism for communication between BIUs and RMUs, and the approach 
used to coordinate their activities. The communication between PEs and BIUs is also described, 
including the general data transfer model used at the BIU interface. 


3.1. ROBUS Messages 

The unit of data transfer in the ROBUS is the ROBUS Message (RM). As shown in Figure 3.1, a 
ROBUS message is composed of a one -bit Tag field followed by a fixed-size Payload field. This basic 
format is used for all the protocols. The Tag field has one of two values: SPECIAL or DATA. The 
relation between the Tag field value and the corresponding bit value on the message is implementation- 
dependent. The format and content of the Payload field depends on the value of the Tag field and the 
context in which the message is used. 


1 bit | fixed number of bits 


◄ ► 

◄ ► 

Tag Field 

Payload Field 


Figure 3.1: ROBUS message format 


If the Tag field is SPECIAL, then the Payload field carries a bit pattern corresponding to one of the 
following labels. 


• SELF_TEST 

• CLIQUE_DETECTION 

• CLIQUE_INITIALIZATION 

• CLIQUE_JOIN 

• CLIQUE_PRESERV ATION 


• VALID_SCHEDULE 

• ZERO_SCHEDULE 

• INVALID_SCHEDULE 

• INIT 

• ECHO 


• PE_ERROR 

• S OURCE_ERROR 

• N 0_M A J ORIT Y 


S RM denotes the number of SPECIAL ROBUS messages. The assignment of bit patterns to the 
Payload labels is an implementation decision. The listed labels are a collection of all the SPECIAL 
messages defined for this version of the ROBUS. The interpretation of each label is dependent on the 
context in which the message is used. 

If the Tag field is DATA, then the Payload field carries data with a format and content specific to the 
context in which the message is used. Three minor modes use DATA messages: Collective Diagnosis, 
Schedule Update, and PE Communication. 

For Collective Diagnosis, the Payload field of each ROBUS message carries diagnostic data in the 
form of a Boolean vector. Figure 3.2 illustrates the format of diagnostic messages for the case of D 
defendants. Element b denotes a Boolean variable corresponding to an accusation or conviction against 
defendant i, which can be a BIU or an RMU. If the diagnosed defendants are BIUs, then D equals the 
number of BIUs, which is denoted by N. Otherwise, D is equal to the number of RMUs, denoted by M. 
The assignment of value to any unused bits is implementation-dependent. 
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Diagnostic data 


Unused 


◄ ► 

◄ ► 

bi 

b 2 


t>D 

Unspecified 


Figure 3.2: Payload format for diagnostic ROBUS messages 

For Schedule Update, the DATA messages carry the number of messages scheduled for a particular 
PE. For this version of the ROBUS, it is valid to schedule a single PE to source the maximum number of 
messages that the bus can send during PE Communication, which is denoted by K PE l max . Therefore, the 
Payload field for Schedule Update DATA messages corresponds to an integer in the range 0 to K PE l max . 

For the PE Broadcast protocol in the PE Communication mode, the DATA messages carry 
information from the PEs. The format of these messages is application-dependent. L PE denotes the 
minimum Payload width requirement for PE messages. The exchange of accusations after the completion 
of the scheduled broadcasts uses the payload format for diagnostic ROBUS messages. 

In addition to the protocols mentioned above, each BIU uses a DATA message to send its 
identification number to its attached PE. The Payload field for these messages corresponds to an integer 
number in the range of 1 to N. 

The width of the Payload field, denoted by L PF , must satisfy the following constraint. 

L PP > max (flog 2 (SRM)l, N, M, flog 2 (K PE l max + 1)1, L pe ) (3.1) 


3.2. Node process model 

Figure 3.3 illustrates the process decomposition for the Computation Module of the ROBUS nodes. 
The Mode, Local Time, Diagnostics, and Schedule Processes hold the state information of the node. The 
Receive, Computation, and Send Processes perform protocol-specific operations. The Computation 
Process handles all the computation required by the protocols. The Send and Receive Processes interface 
with the local Communication Module and handle the ROBUS-specific communication functions. For 
the BIUs, the PE Interface handles the communication with the PEs. Error checks are located throughout 
the processes as appropriate. The timing patterns of the processes vary depending on the protocol being 
executed. 


3.3. Communication between BIUs and RMUs 

The ROBUS requires bidirectional communication between each BIU and RMU pair. This is realized 
using independent communication links in each direction. The communication links must provide 
adequate protection against the propagation of physical faults between interconnected nodes. 
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Figure 3.3: Main processes for ROBUS nodes 

The behavioral design of the ROBUS requires that every node is able to broadcast ROBUS messages 
to the nodes of the opposite kind, simultaneously transmit and receive messages, and independently 
receive messages from every node of the opposite kind. The broadcast transmission function can be 
implemented using one-to-one or one-to-many transmitters. The reception requirements for BIUs and 
RMUs are satisfied by having a separate and independent receiver for each node of the opposite kind. In 
order to limit the cost and complexity of the system, the communication resources use mostly COTS 
components and every node uses the same communication links for synchronization and synchronous 
protocols. Contention in accessing the communication resources is prevented by the proper scheduling of 
the protocols and their operations. 

The operation of the links is characterized by the transmission delay and the throughput. The delivery 
delay for a point-to-point link is the real time elapsed from the instant a ROBUS message is input to the 
transmitter until it is output at the receiver. The delivery precision is the range of variation of the 
delivery delay for a broadcast transmission. The throughput of a link is measured in terms of the 
minimum data introduction interval (or DII) [De Micheli 94], which is the minimum time required by 
the link between the instants at which consecutive messages are input to the transmitter. In general, 
smaller values of the delivery delay, delivery precision, and minimum DII result in a better performing 
system. 

The BIUs and RMUs use two communication models: fixed delay and synchronous. In the fixed- 
delay communication model, the transmissions are triggered by events at the sources and the receiving 
nodes process the messages as soon as they arrive. For these transmissions, the information of interest is 
in the timing of the message. Fixed-delay communication is used by the synchronization protocols to 
enable receiving nodes to measure the relative skew of events at the source nodes. Likewise, this mode of 
communication enables the source nodes to estimate the time of some events at the receiving nodes. In 
the synchronous communication model, the transmissions are triggered at the local time indicated by the 
time -indexed operation schedule and the receiving nodes buffer the messages until their scheduled time 
for processing. This buffering of the messages implements a deskewing function that synchronizes the 
received messages to the local time at the receiving nodes. The relevant information carried by the 
messages is in their content. This model of communication is used by the synchronous protocols. 
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3.4. Distributed coordination 


The functionality of the nodes and their interaction must be well specified in order to achieve the 
desired ROBUS behavior. The functional description of the mode logic, the diagnostic system, and the 
protocols specifies the required functions, their sequencing, and the nodes where the functions are to be 
performed. Those areas of the ROBUS design are discussed elsewhere in this document. This section 
examines timing aspects in the implementation of the protocols. 

All the synchronization protocols are composed of the same basic operations. They are all event- 
driven, use only fixed-delay communication, and perform event voting using the Accept function. A 
requirement for all the synchronization protocols is that the Accept functions must receive all their 
corresponding valid inputs. However, the protocols differ substantially in the assumed initial precision. 
For this reason, the timing of their communication and computation processes is quite different. For 
Synchronization Preservation in the Clique Join and Clique Preservation modes, the nodes are assumed to 
be in the synchronized state with a relatively high synchronization precision. The beginning of this 
protocol is time triggered. Although the communication and computation processing are event-driven, it 
is possible to use protocol events and knowledge of their precision bound to determine local-time 
intervals for the expected times of transmission and reception. Based on this information, the Accept 
functions can be activated at specific local times when their corresponding input messages should be 
available. For Initial Synchronization in Clique Initialization mode, the nodes are in the unsynchronized 
state at the beginning of the protocol. The known bound of the starting synchronization precision enables 
the use of a time trigger for the protocol, but due to the large initial imprecision, no attempt is made to 
estimate the time of reception at the nodes executing the protocol. Because of this, the nodes must simply 
be ready to process the messages whenever they arrive. For Synchronization Acquisition in Clique 
Detection mode, the bus is assumed to be synchronized, but the recovering node is not. During this 
mode, a protocol is used to loosely synchronize to the re-synchronization interval of the clique operating 
in Clique Preservation mode. After this, the node must be ready to process the synchronization messages 
whenever they arrive. Appendix C analyzes the timing of the synchronization protocols in more detail, 
including the timing requirements for the internal processes of the nodes. 

The synchronous protocols use time -triggered communication and processing. The ROBUS is able to 
execute synchronous protocols in the unsynchronized and synchronized states by exploiting the known 
bounds on the local-time synchronization precision in each state. The timing of execution of the 
synchronous protocols is specified by a time-indexed operation schedule. The scheduling of operations is 
based on a distributed synchronous composition abstract model of the system in which a single 
oscillator drives a common local-time clock and fixed-delay processes corresponding to the 
communication and computation operations of the BIUs and RMUs. 

Figure 3.4 illustrates an example of the use of synchronous composition to schedule distributed 
processes. Element A of Figure 3.4 is the process dependency graph. PI and P2 are computation 
processes, and COMM is a communication process. Element B of Figure 3.4 is the local time axis for the 
synchronous composition model. T P1 , Tcomm - and T P2 are the start times for the processes, and A P1 , A P2 , 
and Acomm are the process delays. For proper operation, the schedule must satisfy the following 
constraints: Tcomm ^ T P[ + A P i , and T P2 > Tcomm + Acomm- For the actual system, processes PI and P2 
are performed on separate nodes driven by independent oscillators. Elements C and D of Figure 3.4 are 
the local time axes for the source and receiver nodes, respectively. The computation processes are simply 
started at their scheduled local times and run to completion using the computational resources of their 
respective nodes. The COMM process implements the synchronous communication. The message is sent 
(i.e., transferred to the link transmitter) by the source at the scheduled time. 7t src . rcv denotes the bound on 
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the synchronization precision, which corresponds to the uncertainty in the time of transmission measured 
in real time. The nominal link delay is denoted by Ai ink , and the link delay imprecision is e PP . 
Considering these uncertainties, the expected time of reception is determined to be between T L and T H . If 
a message arrives earlier than T H , it is buffered at least until T H . Therefore, the communication delay 
used for scheduling purposes is A C omm = T h - T C omm- Appendix B presents a more detailed analysis of 
the point-to-point communication process. Note that the computation delays used for scheduling are the 
worst-case delays. Similarly to the communication process, if computation results are available earlier 
than expected, they must be buffered until the next communication process is started. 
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Figure 3.4: Coordinating distributed processes using synchronous composition 

A: Dependency graph, B: Timeline for synchronous composition, 

C: Timeline for source node, D: Timeline for receiver node 

The synchronous composition model is applied to the scheduling of the protocols and their operations. 
The scheduling for an actual implementation must take into consideration the throughput capacity of the 
links, computation, and diagnostic processes, and the interactions between succeeding protocols, 
including the interactions between synchronous and synchronization protocols. 


3.5. Communication between PEs and BIUs 

The ROBUS requires a bidirectional communication capability between BIUs and their attached PEs. 
A BIU and its PE can be in separate FCRs or they can share an FCR. If a BIU and its PE are in separate 
FCRs, the physical communication links must provide adequate barriers to the propagation of faults 
between the FCRs. In this case, the physical failure of the BIU is independent from the failure of the PE 
and the failure recovery process of the BIU is completely independent from the PE. Note that if a BIU 
fails, its attached PE is in effect disconnected from the bus. On the other hand, if a BIU and its PE share 
an FCR, a fault can propagate between the BIU and the PE. In this case, the physical failure of one is no 
different from a failure of the other. Therefore, the design must provide for the simultaneous recovery of 
both components. For this version of the ROBUS, this is handled by a common process that resets the 
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BIU and the PE when a failure is detected on either of them. Irrespective of the FCR configuration, it is 
the responsibility of the PE to monitor the communication in order to determine the state of the BIU. 

The BIUs and the PEs exchange messages using two communication models: fixed delay and 
synchronous. The fixed-delay model is used with the time synchronization protocols and is essentially 
the same as for the communication between BIUs and RMUs. The fixed delay allows the PEs to 
synchronize their time using events at the BIUs as references. Synchronous communication is used with 
the synchronous protocols. Two different synchronous communication models are allowed. In the 
“tight” model, the transmission of messages between BIUs and PEs follows a strict schedule in which 
timing is specified down to the tick level. This is the model used for synchronous communication 
between BIUs and RMUs. In the “loose” synchronous communication model, the sending and receiving 
of messages by the PEs is only required to satisfy simple timing constraints. Since the BIUs have time- 
triggered operation, their input and output of PE messages follows a detailed time-indexed schedule. The 
send timing requirement for the PEs is that their messages must be available at the BIUs at or before the 
time at which the BIUs will read them. The receive timing requirement for the PEs is that they will get 
the messages after the BIUs generate them according to their schedule. The specific time at which the 
PEs send their messages and the delay in receiving BIU messages for the “loose” synchronous 
communication model is dependent on the implementation and the applications run by the PEs. The PE 
Interface at the BIUs is designed using a first-in first-out (FIFO) buffer abstraction for input and output. 
For input, it is assumed that each expected PE message is available or there is a corresponding error 
indication. For output, it is always assumed that the message can be output at its scheduled time without 
having to confirm that the PE is ready to receive it. 

The PEs send messages only during the schedule update and PE Broadcast services. In both cases, 
only DATA messages are sent. For these messages, the BIUs read the messages and broadcast them on 
the bus. PE-error checks are not described in this document. These checks are used to signal the BIU 
when expected messages are invalid or not available. In either case, a BIU will replace the expected PE 
message with a SPECIAL message with PE_ERROR payload field. 

In addition to service messages, the PEs receive mode and identification messages from the BIUs. 
The mode messages enable the PEs to track the mode of their BIUs. A BIU will send a mode message to 
its PE every time there is a major mode transition and after every diagnostic cycle during the Clique Join 
and Clique Preservation modes. The mode messages are SPECIAL messages with the payload field set to 
SELF_TEST, CLIQUE_DETECTION, CLIQUE_INITIALIZATION, CLIQUEJOIN or 
CLIQUE_PRESERVATION, as appropriate. The identification messages inform a PE of the 
identification number of its BIU, which is also the PE’s identification number. These are DATA 
messages with the payload field equal to the BIU’s identification number. This way of giving an 
identification number to a PE is preferred over setting it directly at the PE because it allows the use of 
generic software at the PEs and prevents a mismatch between the BIU and the PE identification numbers. 

Figure 3.5 illustrates the message exchange between a BIU and its attached PE during the Clique Join 
and the Clique Preservation modes. The mode and identification messages are sent to the PE between the 
Self-Diagnosis and Schedule Update services. During Schedule Update, the BIU reads the schedule 
submitted by its PE and sends to the PE the results decided by the bus for each PE. This is followed by a 
single message indicating the assessment of the new schedule. This consists of a SPECIAL message with 
the payload set to VALID_SCHEDULE, ZERO_SCHEDULE (i.e., the schedule is valid and equal to zero 
for all the PEs), or INVALID_SCHEDULE. If the schedule is invalid, the ROBUS will automatically 
switch to a default schedule. During the PE Broadcast service, a BIU will read the scheduled messages 
from its PE and output to the PE the result for all the scheduled messages. A broadcast result equal to 
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PE_ERROR indicates that there was an error at the source PE. If the bus determines that the BIU of a 
source PE is not operating properly, then the result of the broadcast will be SOURCE_ERROR or 
NO_MAJORITY. A result of NO_MAJORITY indicates that the RMUs received different messages 
from the BIU. If the assessment of the schedule was ZERO_SCHEDULE, the PE Broadcast service is 
not executed and the ROBUS simply waits until it is time to execute the Time Reference service. During 
the Time Reference service, a BIU outputs a SPECIAL message with the payload set to INIT. The 
sending of this message is triggered by the reference event that the BIU will use to reset its local-time 
clock. For the Initial Synchronization and Synchronization Acquisition, the payload is set to ECHO to 
explicitly indicate that a different protocol event is used as a reference to reset the local-time clock. 
During the Self-Diagnosis service, the output of a BIU consists of two messages containing the diagnostic 
results for the BIUs and the RMUs. These are the last messages of the diagnostic cycle. The next 
messages are the mode and identification messages for the next cycle. 
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Figure 3.5: Message exchange pattern between a BIU and its attached PE in Clique Preservation mode 
A: PE-to-BIU messages, B: Bus services, C: BIU-to-PE messages 
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4. Diagnostic system 


Conceptually, the ROBUS nodes are composed of two separate but coordinated systems. The 
operational system handles the communication and computation activities required by the distributed 
protocols, in addition to all the processing associated with the mode logic, local time, and PE 
communication schedule. The diagnostic system monitors the operational system and provides timely 
information for reconfiguration and error containment. The main purpose of the diagnostic system is to 
ensure the continued operational survival of the bus. To support the fault-tolerance mechanisms, the 
diagnostic processes must work in close coordination with the operational system processes. The basic 
functions of the diagnostic system are to detect errors during the execution of the protocols, assess the 
status of individual nodes, and assess the status of the bus. The protocols use input-error detection and 
dynamic voting to protect against the propagation of errors into the error-containment regions (ECRs). 
Bus reconfiguration is based on the node assessment results generated by the diagnostic system. The 
ROBUS will continue to deliver services as long as the diagnostic system does not detect a failure of the 
bus. 

The PEs can obtain diagnostic information from the bus in several ways. The ROBUS provides 
explicit periodic updates about the diagnostic status of every node. The protocol for the PE Broadcast 
service not only allows the broadcasting of messages, but it also diagnoses the BIUs as they transmit 
messages, and the results are forwarded to the PEs during the time of the service. In addition, the 
protocols for the PE Broadcast and Schedule Update services provide diagnostic information in the form 
of message content that indicates erroneous behavior by individual PEs or by the group of PEs attached to 
BIUs that are part of the active clique. Each individual PE can also use observations about the behavior 
of its attached BIU to derive additional information about the status of the BIU and the bus. 


4.1. System structure 

Each ROBUS node is an observer of every node. An observed node is known as a defendant. A direct 
observer receives information from the defendant by way of a direct data communication link. An 
indirect observer receives information from the defendant by way of direct observers. Due to the ROBUS 
topology, a node is a direct observer of nodes of its opposite kind and an indirect observer of nodes of its 
same kind, including itself. Every ROBUS node is a defendant and an observer. 

Every ROBUS node performs the diagnostic functions of error detection, node assessment, and bus 
assessment. Error detection is the foundation of the diagnostic system. The communication checks 
monitor the communication links between the nodes. The in-line checks are applied to the received 
messages and are based on expected timing and content characteristics. The cross-lane checks also 
detect errors in received messages by comparing them against the result of dynamic voting. The protocol 
checks inspect received messages and voting results with respect to expected properties for intermediate 
and final protocol results. The self-checks are performed by a node to monitor its own operation. PE- 
error checks inspect the messages received by the BIUs from their attached PEs. All these error checks 
generate the syndromes from which diagnostic decisions are made. 

The diagnostic system uses a distributed hierarchical classification system in which the severity of the 
diagnostic assessment for a defendant is related to the degree of certainty about its untrustworthiness. 
The diagnostic assessment of nodes is performed at the local and collective levels. Each ROBUS node 
gathers and processes diagnostic syndromes in order to form a local opinion about the status of each 
defendant, including itself. The nodes then share local assessment results to make a collective assessment 
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of each defendant. The overall assessment about the trustworthiness of a defendant is based on a 
combination of the local and collective assessments. 

At the local level, each node uses knowledge about the protocols to interpret the generated error 
syndromes. A defendant is suspected by an observer when it determines that the defendant is one of 
several possible culprits for a detected error. In this case, the observer must combine multiple error 
syndromes to decide if a particular defendant is an error source. A defendant is blamed when an observer 
determines that the defendant is the source of a detected error. Blame can be assigned directly from the 
individual error syndromes or through the processing of suspicions. A defendant is accused when an 
observer independently determines that the defendant is untrustworthy but the observer is uncertain that 
other observers have reached the same conclusion. The accusations are based on assigned blame for 
detected errors. 

At the collective level, the nodes exchange their accusations against defendants to form a common 
opinion about each one. The Collective Diagnosis protocol generates convictions by merging local 
diagnoses from direct and indirect observers for each defendant. A node is convicted when a sufficient 
number of observers consider the defendant untrustworthy. 

The diagnostic assessment of the bus corresponds to assessing the status of the clique. The ROBUS 
nodes independently assess a clique based on locally available diagnostic information consisting of 
protocol check syndromes and the results of diagnostic node assessment. 

The results of the diagnostic system are used for reconfiguration, error containment, and operational 
mode decisions. The membership of a clique (i.e., its configuration) is the set of nodes trusted to 
participate in collective operations. A clique is reconfigured by removing or adding nodes based on the 
results of diagnostic node assessment. Error containment is realized by input-error detection, dynamic 
voting, and fail-stop behavior. Input-error detection uses the communication and in-line checks. 
Dynamic voting requires the determination of inputs eligible to participate in a voting operation. Voting 
eligibility depends on the protocol and other specifics of the operation being executed. Fail-stop behavior 
falls under the categories of error containment and operational mode decisions. This behavior is triggered 
by the detection of critical conditions like a local failure or a clique failure. Other conditions relevant to 
operational mode decisions include the results of self-tests and the absence of a valid clique. 


4.2. Diagnostic policy 

The diagnostic policy specifies how the diagnostic data is to be processed to generate the required 
results. Among the factors taken into consideration in the design of the diagnostic policy are the 
following: the fault scenarios that the system is expected to encounter; the requirement to support the 
admission of new nodes into an existing clique; the requirement to support fault-tolerant startup and 
restart; and the requirement to ensure validity and agreement of diagnosis among clique members. 

A clique is expected to survive, albeit in a degraded form, most scenarios of correlated transient faults 
in which external phenomena cause the simultaneous or nearly simultaneous failure of multiple nodes. 
To successfully handle these scenarios, the diagnostic system must rapidly reconfigure the clique by 
removing untrustworthy nodes before they have a chance to overwhelm the fault-masking mechanisms. 

Recovering nodes must have a way to join a clique operating in Clique Preservation mode. This 
requires that the recovering nodes have access to the state of the clique, specially the local time and the 
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diagnostic state, and be able to acquire it. 


The ROBUS design must provide mechanisms to allow trustworthy nodes to reach agreement on the 
local time and the membership of the clique when starting up or restarting the bus. 

Most diagnostic decisions are independently made by the nodes based on their observations. The 
diagnostic system must satisfy certain general properties that ensure that the nodes are able to establish 
and maintain proper diagnostic-state agreement. 


4.2.1. Required properties 

A goal of the diagnostic system is to ensure that every untrustworthy defendant is eventually distrusted 
(i.e., completeness) and that only untrustworthy defendants are distrusted (i.e., correctness). If the fault 
model includes arbitrary asymmetric (“Byzantine”) faults (see Appendix A), it is impossible to guarantee 
both of these properties [Shin 87]. The design of the ROBUS sacrifices completeness in order to ensure 
correctness. 

The ROBUS diagnostic system must satisfy two basic properties: correctness and agreement for non- 
asymmetric defendants. The property of correctness requires that every distrusted defendant is indeed 
untrustworthy. Thus, for situations in which two or more defendants are possible culprits of a detected 
error, each must remain trusted unless there is additional evidence that indicates unambiguously that it is 
untrustworthy. This property ensures that the trusted set held by the observers includes all of the 
trustworthy defendants. The disadvantage of this required property is that the trusted set can also include 
untrustworthy ones. The main reason for requiring this property is to protect against premature 
exhaustion of available redundancy on the bus. The ROBUS includes a significant amount of error 
detection and diagnostic functions added to achieve high error coverage and increase the chances of 
successfully identifying untrustworthy nodes. 

The property of agreement for non-asymmetric defendants requires that all of the trustworthy 
observers of a particular kind in a clique agree on their diagnostic assessment of defendants that are not 
asymmetric. To satisfy this property, all of the trustworthy members of a clique that are of the same kind 
must use the same mechanisms and common information to diagnose defendants. The only exception is 
when a node is diagnosing itself, in which case it is allowed to use exclusive local information. 


4.2.2. General approach 

The activities of the operational system are organized in a hierarchy with the following levels: major 
mode, minor mode, protocol, (protocol) process, and (process) step. The diagnostic system processing 
is directly dependent on the activity performed by the operational system. Every operational major mode 
has a corresponding diagnostic system mode involving two diagnostic intervals. The local diagnostic 
interval is time during which diagnostic information is gathered and processed locally in order to 
generate accusations. The collective diagnostic interval is the time between updates of the convictions. 
For this version of the ROBUS, the diagnostic information is represented in terms of Boolean variables 
(i.e., TRUE or FALSE). The diagnostic state variables of a node are the ones whose values are carried 
from one protocol process to the next. These include the suspicions, accusations, and convictions. 

In the Clique Preservation mode, a clique tries to maintain validity and agreement on the time and 
diagnostic state variables of its members. Figure 4.1 shows the minor modes and the diagnostic intervals 
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for this major mode. The nodes gather local diagnostic information from the beginning of an execution of 
Collective Diagnosis to the beginning of the next. The locally generated accusations are submitted for 
processing at the beginning of Collective Diagnosis and a consistent update to the convictions is received 
at the end. At the beginning of Collective Diagnosis, the nodes copy their current local accusations to a 
temporary memory location and then clear (i.e., set to FALSE) their suspicions and accusations to start 
the next local diagnostic interval. During Collective Diagnosis, the effective accusations are formed by a 
bitwise OR function of the accusations stored in the temporary memory location and any newly generated 
accusations. The accusations in temporary memory are discarded at the end of Collective Diagnosis and 
only accusations from the current local diagnostic interval are used from that point on. The overlap in the 
use of accusations from consecutive local diagnostic intervals ensures that the clique remains guarded 
from untrustworthy nodes during Collective Diagnosis and that the local accusations processed during 
Collective Diagnosis are based on observations gathered during the full local diagnostic interval. 
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Figure 4. 1 : Diagnostic intervals for the Clique Preservation mode 

The processing done by the diagnostic system during the Clique Join mode is essentially the same as 
for the Clique Preservation mode. The only difference is that a node in the Clique Join mode expects not 
to be a member of the clique when it enters this mode and to be part of it at the end. 

In the Clique Detection mode, a node Lies to determine if a clique is operating on the bus. This is 
accomplished by operating as if a clique were present and attempting to acquire its time and diagnostic 
state, while simultaneously monitoring for indications that a valid clique is not present. It is assumed that 
at any time there is at most one clique present on the bus. The error processing by a node in this mode is 
based on the assumption that the node is working properly unless there is unequivocal evidence that it is 
not. In the absence of such evidence, all detected errors involving other nodes or the clique are blamed on 
them. Figure 4.2 shows the minor modes and the range of the diagnostic intervals for the Clique 
Detection mode. A node clears all its diagnostic state variables at the beginning of this mode, and the 
convictions are held constant until an update is received during Collective Diagnosis Acquisition. Only 
local diagnostic information is used to assess the nodes and the clique. During Local Diagnosis 
Acquisition, a node allows its diagnostic system to perform a preliminary assessment of the bus before the 
operational system attempts to synchronize to the clique during the Synchronization Acquisition mode. 
During the Collective Diagnosis mode, a node loads the convictions computed by the clique. The 
operation of the diagnostic system during Collective Diagnosis is the same as described previously for 
nodes in Clique Preservation mode. 

In the Clique Initialization mode, a group of nodes tries to form a new clique after failing to find a 
valid clique in the Clique Detection mode. Figure 4.3 shows the minor modes and diagnostic cycles. 
None of the diagnostic information gathered during Clique Detection is used in this mode. All of the 
diagnostic state variables are cleared at the beginning of this mode, and the convictions are held constant 
until an update is computed. As in Clique Detection mode, in this mode only local diagnostic information 
is used to assess the nodes and the clique. During Initial Diagnosis, the nodes determine the initial set of 
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nodes that will be considered to form the new clique. These nodes agree on the local time during the 
Initial Synchronization mode, and new convictions are computed during Collective Diagnosis. The 
operation of the diagnostic system during Collective Diagnosis is the same as described previously for 
nodes in Clique Preservation mode. 
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Figure 4.2: Diagnostic intervals for the Clique Detection mode 
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Figure 4.3: Diagnostic intervals for the Clique Initialization mode 

In the Self-Test mode, the operational system exercises the circuitry of a node while the diagnostic 
system monitors for errors. No diagnostic information about other nodes is gathered during this mode. 


4.2.3. Suspicion generation 

For some operational scenarios, an observer is able to detect that an error has occurred in a particular 
communication path that starts at a source node of the same kind as the observer, passes through an 
intermediate node of the opposite kind, and ends at the observer. The observer always assumes that its 
own operation is correct unless there is evidence to the contrary. In the absence of other error syndromes 
directly incriminating the source node or the intermediate node, the observer cannot determine which is 
responsible for the detected error. Therefore, the best that the observer can do is to raise suspicions 
against both of them. For this version of the ROBUS, suspicions are generated only for such scenarios. 

Figure 4.4 illustrates the organization of suspicions in a two-dimensional matrix in which the rows 
correspond to the nodes of the same kind as the observer and the columns correspond to the nodes of the 
opposite kind. 0 and Q. denote the number of nodes of the same kind and of the opposite kind, 
respectively. Every Sy cell is a Boolean variable. A single instance of suspicion against a pair of nodes 
is sufficient to assert (i.e., set to TRUE) the corresponding cell in the matrix. 
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4.2.4. Accusation generation 

A node has accusations variables for every node on the bus. Figure 4.5 illustrates the organization of 
the accusations variables. Every A SKi or A 0 K,j cell is a Boolean variable that is asserted when the 
corresponding node is accused. An observer accuses a defendant when it determines that the defendant is 
responsible for one or more detected errors. 
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Figure 4.5: Accusations variables 

There are two accusation generation mechanisms. For most error checks, only one defendant can be 
blamed and a single detected error is sufficient evidence to accuse the defendant. The other accusation 
generation mechanism is the processing of the suspicions matrix. This generates accusations using bit 
vote operations for each row and each column of the suspicions matrix. Only rows and columns 
corresponding to trusted nodes are considered. Figure 4.6 illustrates this for a system with 4 nodes of the 
same kind and 3 nodes of the opposite kind. A 0 K,ilsusp denotes the suspicions-based accusation result for 
the i-th node of the opposite kind. These accusations are generated by bit voting considering only the 
suspicions in the rows corresponding to trusted nodes of the same kind. Suspicions-based accusations 
against nodes of the same kind are similarly determined. A bitwise Boolean OR operation for each 
defendant is used to combine the corresponding results of the accusation generation mechanisms. 

The accusations variables remain constant during the execution of protocol processes and are updated 
after the completion of the protocol process in which the incriminating evidence is found. The suspicion 
matrix is processed only at the end of the PE Communication mode. The accusations and suspicions are 
cleared at the end of the local diagnostic interval. 


30 




1 


Opposite Kind 
2 


3 


Same 

Kind 


Bit vote operations 
for each column 


1 

2 

3 

4 

-> 


S„ 

s 12 

S , .3 

$2,1 

S2.2 

S2.3 

$3,1 

^3,2 

^3,3 

s<u 

S4.2 

S4.3 




Aqk.iI SUSP 

AokJsusd 

AoKjIsusp 


Not trusted; 
not considered in 
the voting 


Figure 4.6: Example of the generation of suspicions-based accusations for nodes of the opposite kind 


4.2.5. Conviction generation 

Figure 4.7 illustrates the organization of the convictions variables. Every C S K,i or C 0 K,j cell is a 
Boolean variable that is asserted when the corresponding node is convicted. The convictions are 
generated by the Collective Diagnosis protocol. Bit vote operations for accusations from eligible voters 
are used to determine the conviction result for each defendant. The Collective Diagnosis protocol is 
presented in Section 5. The conviction variables remain constant until updated at the end of the collective 
diagnostic interval. 
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Figure 4.7: Convictions variables 


4.2.6. Trust 

A node uses the results of local and collective diagnoses to determine which nodes to trust. The 
general rule to determine trust is that a node is trusted if it is not accused and not convicted. This rule 
applies to all operational modes. Both of these diagnosis results are available to nodes in the Clique Join 
and Clique Preservation modes. During the Clique Detection and Clique Initialization modes, only local 
diagnostic information is available until the end of the collective diagnostic cycle. Since a node must be 
trusted unless there is evidence that it is untrustworthy, nodes operating in these modes must clear their 
convictions variables and, in effect, use only the accusations to determine trust. 


4.2.7. Voter eligibility 

Dynamic voting is applied to received messages and gathered suspicions. Distrusted nodes are not 
allowed to participate in voting operations. In addition, detected input errors and message content may be 
considered to determine voter eligibility for received messages. The eligibility conditions are 
independently determined for each instance of voting. 
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4.2.8. Local failure and bus failure conditions 


The trigger for a node to stop its current activity and transition to the Self-Test mode is the detection 
of a local failure or a bus failure. The status of the bus is always determined in terms of the status of the 
clique. The application of error checks and the interpretation of their syndromes are dependent on the 
activities performed by the operational system. For most checks, a detected error can be caused by a 
failure of the observer or the observed object, which is either a particular defendant or a clique as a whole. 
The assessment of individual nodes is always performed with the assumption that the observer and the 
clique are working properly. In general, an observer can assess its own status by using clique-dependent 
checks for which a clique is assumed to be operating properly, or by using clique-independent checks that 
monitor the observer for conformance with its design specification. The latter type of self-checks is an 
optional feature whose use should be decided with consideration given to factors like implementation 
complexity and the application domain. All of the self-checks described in this document are clique- 
dependent checks. Thus, there is no completely unambiguous way for a node to differentiate between a 
local failure and a clique failure. 

In the Clique Initialization, Clique Join, and Clique Preservation modes there is no need to distinguish 
between a local failure and a clique failure because the detection of either always triggers a fail-stop 
response. 

In the Self-Test mode, only local operations are performed and no information from the rest of the bus 
is processed. In this mode, any detected error is presumed to be due to a local failure. 

In the Clique Detection mode, three events trigger abrupt transitions: local failure, clique failure, and 
no clique found. In this mode, no distinction is made between the case in which a clique is not present 
when the observer transitions to this mode and the case in which a clique experiences a failure while the 
observer is in this mode. Both cases are considered indications that a clique is not present. In addition, a 
node in this mode assesses a clique based on the assumption that the node itself is operating properly 
unless there is an unambiguous indication that it is not. Given that the operations defined for this mode 
do not allow a node to observe its own behavior, the only such indication is an accusation against itself. 
This is discussed in detail in Section 7. 

A clique is diagnosed based on its size and the validity and agreement of its processing results. The 
smallest clique allowed is composed of one BIU and one RMU. To assess a clique for size, no distinction 
is made between BIUs or between RMUs. A node uses its trusted set and the results of Collective 
Diagnosis to determine the membership of a clique. Most protocol checks monitor a clique for validity 
and agreement in state variables and processing results. For some input voting operations, an observer 
expects agreement among a majority of the eligible voters, or it knows in advance what the result of the 
voting should be. 


4.2.9. Unexpected messages 

For most modes of operation, a node expects to receive messages during particular local-time 
intervals. This includes all of the synchronous protocols and the Synchronization Preservation protocol. 
The reception of a message when none is expected is an indication of a timing error. In agreement with 
the accusation generation policy, the detection of such an error should result in an accusation against the 
corresponding node of the opposite kind. 
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In order to ensure proper coordinated action among the trustworthy members of a clique, the design of 
the ROBUS nodes must ensure that untrustworthy nodes do not have a chance to influence the time at 
which individual trustworthy nodes update their accusations. This is realized by updating the accusations 
only at predetermined points in time. For this version of the ROBUS, this is done at the beginning and at 
the end of protocol processes. Thus, if an unexpected message is detected in between protocol processes, 
the corresponding accusation becomes effective at the start of the next process in which input messages 
are expected. If the error detection occurs during a process, the accusation becomes effective after the 
completion of the process. 
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5. Clique Preservation 


Figure 5.1 illustrates the minor mode transitions for the Clique Preservation major mode. A node 
enters this mode after confirming that it has been admitted to the clique at the end of a diagnostic cycle in 
Clique Initialization or Clique Join modes. In the Clique Preservation mode, a node participates in the 
delivery of services to the PEs in a continuous loop. Each one of these services is realized with a 
specialized distributed protocol. The only exit condition for the Clique Preservation mode is the detection 
of a local failure or a bus failure. 

The local failure or bus failure conditions are either protocol-independent or protocol-dependent. The 
protocol-independent conditions include each of the following: number of trusted BIUs equal to zero, 
number of trusted RMUs equal to zero, and assertion of an accusation against self. The protocol- 
dependent conditions are described with the corresponding protocols. 


From Clique Initialization 
or Clique Join 



Figure 5.1: Minor-mode transitions for Clique Preservation mode 


5.1. Schedule Update 

In the Schedule Update mode, the PEs submit their desired schedule for the next PE Broadcast 
communication service. Ideally, the PEs have agreement on the schedule before they deliver it to the 
ROBUS. Let N denote the number of BIUs, which is assumed to equal the number of PEs connected to 
the bus. The desired schedule is delivered by each PE to its BIU in the form of N consecutive messages 
with the positions in the sequence corresponding to the identification numbers of the PEs and the payload 
fields of the messages indicating the desired number of messages to be broadcast. The submitted 
schedule messages are processed using an agreement protocol to ensure that all of the clique members and 
the PEs agree on the result for each PE. The protocol is applied independently N times, with each 
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iteration processing the messages delivered by the PEs that indicate the number of messages to be 
broadcast by a particular PE. After all the messages have been processed, the ROBUS nodes individually 
assess the resulting schedule. Since there is agreement on the protocol results and the nodes apply the 
same assessment rules, their assessment results are guaranteed to be the same. 

Note that this protocol is executed by nodes in the Clique Preservation and Clique Join modes. 


5.1.1. Schedule Update protocol 

The Schedule Update protocol was developed for ROBUS based on the theory presented in [Miner 
04]. Figure 5.2 shows the message flow graph. The labels inside the circles identify the processes 
executed by the ROBUS nodes. This protocol is a synchronous protocol implemented using synchronous 
communication. 


PEs 


BlUs 


RMUs 



Figure 5.2: Message flow graph for the Schedule Update protocol 

The following is the description of the basic Schedule Update protocol. This protocol determines the 
number of messages to be broadcast by the i-th PE according to the schedule messages submitted to the 
ROBUS by the PEs. The sublevels in the description specify the checks to be performed during particular 
protocol steps. All of the checks with specific expectations will result in errors being signaled if the 
expectations are not met. Note that the ROBUS messages are expressed in functional notation: RM(tag, 
payload). A message with an arbitrary payload field is indicated by a symbol in the payload location. 


Process P0: BIUs 

1. If the PE message is valid, broadcast that message. Otherwise, broadcast 
RM(SPECIAL, PE_ERROR). 


Process PI: RMUs 

1. Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 
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1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, PE_ERROR), RM(DATA, *) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BUI is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(DATA, *) 

3. Perform a word vote on the messages from eligible voters. If there is no majority 

or there are no eligible voters, then convert the result to RM(SPECIAL, 
PE_ERROR). 

4. Broadcast the result of the vote. 


Process P2: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1 .2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, PE_ERROR), RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each RMU: RM(SPECIAL, PE_ERROR), RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one RMU is an eligible voter 

3. Perform a word vote on the messages from eligible voters. If there is no majority, 

then convert the result to RM(SPECLAL, PE_ERROR). 

4. Broadcast the result. 

5. Send the result of the vote to the attached PE. 


Process P3: RMUs 

1 . Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, PE_ERROR), RM(DATA, *) 
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2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BUI is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(SPECIAL, PE_ERROR), RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 

3. Perform a word vote on the messages from eligible voters. 

3.1. Cross-lane checks for each BIU: 

3.1.1. Expecting agreement with the result of the vote 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

4. Broadcast the result of the vote. 

5. The result of the vote is the protocol result for the i-th PE. 


Process P4: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1 .2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, PE_ERROR), RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each RMU: RM(SPECIAL, PE_ERROR), RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one RMU is an eligible voter 

3. Perform a word vote on the messages from eligible voters. 

3.1. Cross-lane checks for each RMU: 

3.1.1. Expecting agreement with the result of the vote 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

4. The result of the vote is the protocol result for the i-th PE. 


A reception error, also called an input error, is a violation of the expectations for the communication or 
in-line checks. An error detection by any of these checks in processes PI through P4 is sufficient 
evidence to accuse the corresponding node of the opposite kind 

Note that in process PI the expected content and the eligible content are not the same. The content 
RM(SPECLAL, PE_ERROR) is a valid input, but it is not eligible to determine the voting result. 
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The cross-lane checks in processes P3 and P4 compare the result of the vote with the received input 
from each node of the opposite kind. An error detection by these checks is sufficient evidence to accuse 
the corresponding node of the opposite kind. 

Two types of protocol checks are used. In processes P2 through P4, it is expected that at least one 
node of the opposite kind is eligible to vote. In processes P3 and P4, it is expected to have agreement on 
received message content for a majority of the eligible voters. An error detection by any of these checks 
is an indication of a clique failure. 


5.1.2. Schedule update assessment 

The Schedule Update protocol determines the number of messages to be broadcast by a particular PE. 
The result of the protocol can be a DATA message with the number of scheduled messages in the payload 
field, or a PE_ERROR message indicating that the protocol was unable to determine a valid number of 
messages for the PE being considered. 

The assessment of the schedule update produces one of three results: invalid, valid, or zero. A 
schedule is invalid if the result of the Schedule Update protocol is PE_ERROR for any PE, or if the total 
number of scheduled messages exceeds the maximum number of messages that the ROBUS can process 
during the PE Communication mode. This maximum number is a constant during run-time and is 
determined by doing a timing analysis of the bus implementation. A schedule is valid if it is not invalid. 
A zero schedule is a special case of a valid schedule in which the number of scheduled message is zero 
for every PE. 

After completing their assessment, the BIUs send a SPECIAL message to the PEs to inform them of 
the result. The payload field is one of the following: INV ALID_SCHEDULE, VALIDJSCHEDULE, or 
ZEROJSCHEDULE. 


5.1.3. Application of the schedule update assessment 

The schedule for the next PE broadcast session depends on the result of the schedule update 
assessment. If the result is valid, the new schedule is used. If the result is zero, there will be no bus 
activity during the next PE Communication mode. If the result is invalid, a default schedule is used. The 
default schedule is constant during run-time and is known to all the ROBUS nodes and the PEs. For this 
version of the ROBUS, the default schedule allocates the same number of transmissions for each PE. 


5.2. PE Communication 

In the PE Communication mode, the ROBUS broadcasts PE messages according to the 
communication schedule. The access pattern is a time-indexed round-robin sequence in which the 
interval between the send times of consecutive messages is constant regardless of whether they are from 
the same or different PEs. The BIUs (and their attached PEs) access the bus in ascending order according 
to their identification numbers. If a particular PE is not scheduled to send messages, the next one that is 
scheduled will automatically take its place to ensure that the proper interval between messages is 
maintained. Each PE message is broadcast using an agreement protocol to ensure that the PEs receive the 
same result. 
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After all the scheduled messages have been processed, the BIUs and RMUs exchange their 
accumulated accusations against nodes of the opposite kind. This exchange enhances the reconfiguration 
capabilities of the bus by ensuring that the required diagnosis properties are satisfied when processing the 
accumulated suspicions. This is explained in detail in Appendix F. 

The suspicions matrix is processed following the completion of the Accusation Exchange protocol. 
Section 4 of this document describes this operation. 

Note that these protocols are executed by nodes in the Clique Preservation and Clique Join modes. 


5.2.1. PE Broadcast protocol 

The PE Broadcast protocol was inspired by the source congruency protocol presented in [Smith 84] 
for Drapper Lab’s Fault Tolerant Processor (FTP) architecture. Figure 5.3 illustrates the message flow 
graph. This protocol is a synchronous protocol implemented using synchronous communication. The 
scheduled message is generated by the source PE and relayed by the attached BIU (a.k.a. the source BIU). 
The result of the protocol is received by all the PE attached to BIUs that are part of the clique or are in 
Clique Join mode. 

The protocol is an agreement protocol with embedded diagnostic processing. If the source PE sends a 
valid message and its attached BIU is working properly, then the PEs will receive the message sent. A 
result of PE_ERROR indicates that the source BIU did not receive a valid message from the PE. 
SOURCE_ERROR indicates that there was an error caused by the source BIU. A result of 
NO_MAJORITY means that the source BIU did not broadcast the same message to each of the RMUs. 



Figure 5.3: Message flow graph for the PE Broadcast protocol 


The PE Broadcast protocol is presented next. 


Process PO: Source BIU 

1. If the PE message is valid, broadcast that message. Otherwise, broadcast 
RM(SPECIAL, PE_ERROR). 
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Process PI: RMUs 

1. Receive the message from the source BIU. 

1.1. Communication checks for the source BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for the source BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, PE_ERROR), RM(DATA, *) 

2. For the source BIU, if there was a reception error, the message content is ineligible, 

or there is an accusation against it, then the BIU is ineligible. 

2.1. Eligible content for the source BIU: RM(SPECIAL, PE_ERROR), RM(DATA, 

*) 

3. If the source BIU is eligible, the result is the received message. Otherwise, the 

result is RM(SPECIAL, SOURCE_ERROR). 

4. Broadcast the result. 


Process P2: BIUs 

1 . Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, PE_ERROR), RM(SPECIAL, 

S OURCE_ERROR) , RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each RMU: RM(SPECIAL, PE_ERROR), RM(SPECIAL, 
S OURCE_ERROR) , RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one RMU is an eligible voter 

3. Perform a word vote on the messages from eligible voters. If there is no majority, 

then convert the result to RM(SPECLAL, NO_MAJORITY). 

3.1. Cross-lane checks for each RMU: 

3.1.1. Expecting agreement with the result of the vote 

3.2. Protocol check: 

3.2.1. Expecting that the vote result is not equal to RM(SPECLAL, 

N 0_M A J ORIT Y) or RM(SPECIAL, SOURCE_ERROR) 

3.3. Self-check: (source BIU only) 

3.3.1. Is the vote result equal to the message broadcast in process PO? 


41 




4. If the source is not trusted, then convert the result to RM(SPECIAL, 

S OURCE_ERROR) . 

5. Send the result to the attached PE. 


In processes PI and P2, a reception error is sufficient evidence to accuse the corresponding node of the 
opposite kind. 

In processes P2, it is expected that at least one node of the opposite kind is eligible to vote. An error 
detection by this check is an indication of a clique failure. A vote result of RM(SPECIAL, 
SOURCE_ERROR) or RM(SPECLAL, NO_MAJORITY) in process P2 is an indication of an error by the 
source BIU and is sufficient evidence to accuse it. If the vote result in process P2 is not RM(SPECIAL, 
SOURCE_ERROR) or RM(SPECLAL, NO_MAJORITY), and an error is detected for a cross-lane check, 
then it is known that one or both of the corresponding RMU and the source BIU is responsible for the 
error. This is sufficient evidence to generate a suspicion against the RMU and the source BIU. 

The self-check in process P2 is stated as a question because the expected result depends on the mode 
of the node executing this protocol process. For a node in the Clique Preservation mode or the Clique 
Join mode with its output enabled, it is expected that the result of the vote is equal to the message 
broadcast in process PO. For a node in the Clique Join mode with its output disabled, it is expected that 
the message broadcast in process PO and the voting result in process P2 do not match. (Furthermore, the 
voting result should be SOURCE_ERROR for a node in the Clique Join mode with its output disabled.) 
Irrespective of the operating mode, a violation of the expectation for this check is an indication of a local 
failure. 


5.2.2. Accusation Exchange protocol 

Figure 5.4 shows the message flow graph for the Accusation Exchange protocol. Only BIUs and 
RMUs participate in this protocol. This protocol is a synchronous protocol implemented using 
synchronous communication. 


BIUs 


RMUs 



Figure 5.4: Message flow graph for the Accusation Exchange protocol 


The description of the protocol is presented next. 
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Process PO: BIUs 

1. Broadcast the local accusations against the RMUs. 


Process PI: RMUs 

1. Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BUI is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 

3. For each RMU defendant, perform a bit vote on the accusations received from 

eligible voters. 

3.1. Cross-lane checks for each BIU: 

3.1.1. Expecting agreement with the result of the vote for each RMU defendant 

4. Broadcast the local accusations against the BIUs. 

5. For each RMU defendant, merge the result of the bit vote with the local accusation 

value. The result is the new local accusation value for the defendant. 


Process P2: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1 .2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each RMU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 
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3. For each BIU defendant, perform a bit vote on the accusations received from 

eligible voters. 

3.1. Cross-lane checks for each RMU: 

3.1.1. Expecting agreement with the result of the vote for each BIU defendant 

4. For each BIU defendant, merge the result of the bit vote with the local accusation 

value. The result is the new local accusation value for the defendant. 


Bitwise OR functions implement the merge operations in processes PI and P2 between the results of 
the bit votes and the local accusations. The updates to the accusations become effective after the 
corresponding protocol process is complete. 

In processes PI and P2, a reception error is sufficient evidence to accuse the corresponding node of the 
opposite kind. 

In processes PI and P2, it is expected that at least one node of the opposite kind is eligible to vote. An 
error detection by this check is an indication of a clique failure. 

For each bit vote operation in processes PI and P2, if an error is detected for a cross-lane check, then it 
is known that the defendant for the bit vote, the node of the opposite kind for which the error was 
detected, or both are untrustworthy. This is sufficient evidence to generate a suspicion against the 
defendant and the node of the opposite kind. 


5.3. Synchronization Preservation 

In the Synchronization Preservation mode, the ROBUS executes a distributed synchronization 
protocol to re-synchronize the local-time clocks of the BIUs and RMUs, and to provide the PEs with a 
common time reference. It is assumed that the clique is already synchronized within some known 
precision bound. The protocol is intended to improve the precision by eliminating any relative skew 
introduced by the drift rate of the oscillators since the last synchronization. 

The Synchronization Preservation protocol was originally inspired by the clock synchronization 
protocol presented in [Srikanth 87], and it is an extension of the synchronization protocol in [Miner 02], 
Figure 5.5 shows the message flow graph for the protocol. This protocol is a synchronization protocol 
implemented using fixed-delay communication. The labels next to the arrows indicate the type of 
SPECIAF message that is transmitted by the sending process. The protocol is an agreement generation 
protocol with provisions for nodes trying to synchronize to the clique. 
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PEs 


BIUs 


RMUs 



Figure 5.5: Message flow graph for the Synchronization Preservation protocol 


The description of the protocol is presented next. Agreement checks are performed by comparing the 
time between relevant events against an expected duration. A synchronization-reset timer implements a 
delay between a reference event and the resetting of the local-time clock. Appendix C provides additional 
information about the timing aspects of this protocol. 


Process PO: BIUs 

1 . If it is time to send, broadcast RM(SPECIAL, 1NIT). 


Process PI: RMUs 

1. Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1 .2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, INIT) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BUI is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(SPECIAL, INIT) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 

3. Compute the Accept function for the messages received from the eligible voters. 

3.1. Cross-lane checks for each BIU: 

3.1.1. Expecting agreement with the result of the Accept function 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

4. When the Accept output is asserted, broadcast RM(SPECIAL, INIT). 
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Process P2: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, INIT) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.E Eligible content for each RMU: RM(SPECIAL, INIT) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one RMU is an eligible voter 

3. Compute the Accept function for the messages received from the eligible voters. 

3.1. Cross-lane checks for each RMU: 

3.1.1. Expecting agreement with the result of the Accept function 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

4. When the Accept output is asserted, broadcast RM(SPECLAL, ECHO), start the 

synchronization-reset timer, and send RM(SPECLAL, INIT) to the attached PE. 

Process P3: RMUs 

1. Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECLAL, ECHO) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BIU is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(SPECIAL, ECHO) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 

3. Compute the Accept function for the messages received from the eligible voters. 

3.1. Cross-lane checks for each BIU: 

3.1.1. Expecting agreement with the result of the Accept function 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 


46 




4. When the Accept output is asserted, broadcast RM(SPECLAL, ECHO) and start the 
synchronization-reset timer. 


Process P4: BIUs 

1 . Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECLAL, ECHO) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(SPECIAL, ECHO) 

2.2. Protocol checks: 

2.2. 1. Expecting that at least one BIU is an eligible voter 

3. Compute the Accept function for the messages received from the eligible voters. 

3.1. Cross-lane checks for each RMU: 

3.1.1. Expecting agreement with the result of the Accept function 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 


In processes PI through P4, a reception error or a cross-lane check error is sufficient evidence to 
accuse the corresponding node of the opposite kind. 

In processes PI through P4, it is expected that at least one node of the opposite kind is eligible to vote 
and also to have agreement on received message timing for a majority of the eligible voters. An error 
detection by any of these checks is an indication of a clique failure. 


5.4. Collective Diagnosis 

In the Collective Diagnosis mode, the clique executes the Collective Diagnosis protocol to achieve a 
consistent diagnostic view of every ROBUS node in the system, including those that are not part of the 
clique. Two executions of the protocol are performed: one to diagnose RMUs and another to diagnose 
BIUs. Each protocol execution takes the accusations against each defendant of a particular kind from all 
of the nodes that are part of the clique, combines them to assess the trustworthiness of each defendant, 
and then distributes the resulting conviction results. Both executions of the Collective Diagnosis protocol 
use ROBUS messages formatted to carry diagnostic data corresponding to accusations or convictions. 
This message format is presented in Section 3. The Collective Diagnosis protocol is a synchronous 
protocol implemented using synchronous communication. 

The Collective Diagnosis protocol was originally inspired the MAFT approach to on-line diagnosis 
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presented in [Walter 97]. Geser and Miner [Geser 04] developed the current version of the protocol, 
which is optimized for the ROBUS. 

Note that the Collective Diagnosis protocol is executed by nodes in each of the major modes, except 
for the Self-Test mode. 


5.4.1. Collective Diagnosis protocol for RMU defendants 

Figure 5.6 shows the message flow graph for the Collective Diagnosis protocol applied to diagnose 
RMU defendants. The labels next to the arrows indicate the type of data transmitted by the message 
sources. 



Figure 5.6: Message flow graph for the Collective Diagnosis protocol for RMU defendants 

The description of the protocol is presented next. The merge operation in process PI is a two-input 
Boolean OR function. 


Process P0: BIUs 

1. Broadcast the local accusations against the RMUs. 


Process PI: RMUs 

1 . Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BIU is not trusted, then the BIU is not an eligible voter. 
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2.1. Eligible content for each BIU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 

3. For each RMU defendant, perform a bit vote on the accusations received from 

eligible voters. 

4. For each RMU defendant, merge the result of the bit vote with the local accusation 

value. 

5. Broadcast the results of the merge operations as a single ROBUS message. 


Process P2: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 
E2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each RMU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one RMU is an eligible voter 

3. For each RMU defendant, perform a bit vote on the accusations received from 

eligible voters. 

4. Broadcast the results of the bit vote operations as a single ROBUS message. 


Process P3: RMUs 

1. Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BIU is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 
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3. Perform a word vote on the messages received from eligible voters. 

3.1. Cross-lane checks for each BIU: 

3.1.1. Expecting agreement with the result of the vote 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

4. Broadcast the result of the vote. 

5. The result of the vote has the updated convictions against the RMUs. 

5.1. Protocol checks: 

5.1.1. Expecting that not all of the RMUs are convicted 

5.2. Self-check: 

5.2.1. Is the local node convicted? 


Process P4: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1 .2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each RMU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one RMU is an eligible voter 

3. Perform a word vote on the messages received from eligible voters. 

3.1. Cross-lane checks for each RMU: 

3.1.1. Expecting agreement with the result of the vote 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

3.2.2. Is the the result of the vote equal to the result in process P2? 

4. Send the result of the vote to the attached PE. 

5. The result of the vote has the updated convictions against the RMUs. 

5.1. Protocol checks: 

5.1.1. Expecting that not all of the RMUs are convicted 


50 




In processes PI through P4, a reception error is sufficient evidence to accuse the corresponding node 
of the opposite kind. 

In processes P2 through P4, it is expected that at least one node of the opposite kind is eligible to vote. 
In processes P3 and P4, it is expected to have agreement on received message content for a majority of 
the eligible voters. An error detection by any of these checks is an indication of a clique failure. 

The cross-lane checks in processes P3 and P4 compare the result of the vote with the received input 
from each node of the opposite kind. An error detection by these checks is sufficient evidence to accuse 
the corresponding node of the opposite kind. 

In processes P3 and P4, it is expected that the result of the word vote does no indicate that all of the 
RMUs are convicted. A violation of this expectation indicates a clique failure. 

The self-check in process P3 is stated as a question because the expected result depends on the mode 
of a node executing this protocol process. For a node in Clique Preservation mode or in the first pass in 
the Clique Join mode, the result of the vote should indicate a conviction against the node. For a node in 
Clique Initialization, Clique Preservation, or the second pass in the Clique Join mode, the expected result 
is that the node is not convicted. In Clique Detection mode, a detected error is interpreted as a clique 
failure. In all of the other cases, an error detection indicates a local failure or a clique failure. 

Nodes in the Clique Join, Clique Initialization, and Clique Preservation modes expect that, for each 
defendant, the results in processes P2 and P4 are equal. An error detection in any of these modes 
indicates a clique failure. For nodes in the Clique Detection mode, there is no expected relation between 
the results. 


5.4.2. Collective Diagnosis protocol for BIU defendants 

Figure 5.7 shows the message flow graph for the Collective Diagnosis protocol applied to diagnose 
BIU defendants. The labels next to the arrows indicate the type of data transmitted by the message 
sources. 


PEs 


BIUs 


RMUs 



Figure 5.7: Message flow graph for the Collective Diagnosis protocol for BIU defendants 


The description of the protocol is presented next. 
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Process PO: RMUs 

1 . Broadcast the local accusations against the BIUs as a single diagnostic message. 


Process PI: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.1. Eligible content for each RMU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one RMU is an eligible voter 

3. For each defendant BIU, perform a bit vote on the accusations received from 

eligible voters. 

4. For each defendant BIU, merge the result of the bit vote with the local accusation. 

5. Broadcast the results of the merge operations as a single diagnostic message. 


Process P2: RMUs 

1. Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BIU is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 

3. For each BIU defendant, perform a bit vote on the accusations received from 

eligible voters. 

4. Broadcast the results of the bit vote operations as a single diagnostic message. 
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Process P3: BIUs 

1. Receive the messages from the RMUs. 

1.1. Communication checks for each RMU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each RMU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each RMU, if there was a reception error, the message content is ineligible, or 

the RMU is not trusted, then the RMU is not an eligible voter. 

2.E Eligible content for each RMU: RM(DATA, *) 

2.2. Protocol checks: 

2.2. 1. Expecting that at least one RMU is an eligible voter 

3. Perform a word vote on the messages received from eligible voters. 

3.1. Cross-lane checks for each RMU: 

3.1. E Expecting agreement with the result of the vote 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

4. Broadcast the result of the vote. 

5. Send the result of the vote to the attached PE. 

6. The result of the vote has the updated convictions against the BIUs. 

6.1. Protocol checks: 

6.1.1. Expecting that not all of the BIUs are convicted 

6.2. Self-check: 

6.2.1. Is the local node convicted? 


Process P4: RMUs 

1. Receive the messages from the BIUs. 

1.1. Communication checks for each BIU: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each BIU: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(DATA, *) 

2. For each BIU, if there was a reception error, the message content is ineligible, or 

the BIU is not trusted, then the BIU is not an eligible voter. 

2.1. Eligible content for each BIU: RM(DATA, *) 

2.2. Protocol checks: 

2.2.1. Expecting that at least one BIU is an eligible voter 
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3. Perform a word vote on the messages received from eligible voters. 

3.1. Cross-lane checks for each BIU: 

3.1.1. Expecting agreement with the result of the vote 

3.2. Protocol checks: 

3.2.1. Expecting agreement among a majority of the eligible voters 

3.2.2. Is the the result of the vote equal to the result in process P2? 

4. The result of the vote has the updated convictions against the BIUs. 
4.1. Protocol checks: 

4.1.1. Expecting that not all of the BIUs are convicted 


This protocol is essentially the same as the protocol for collective diagnosis of RMUs. All the 
comments stated in the previous section for the error checks of that protocol apply to this one as well. 


5.4.3. Concurrent diagnosis for RMU and BIU defendants 

It is possible to diagnose the RMU and the BIU defendants concurrently by overlapping the message 
flow graphs for the executions of the Collective Diagnosis protocol for RMU and BIU defendants. Figure 
5.8 shows the resulting pattern. This pattern achieves a significant reduction in the time required to 
complete the self-diagnosis of the bus. 


PEs 


BIUs 


RMUs 



Figure 5.8: Message flow graph for the concurrent diagnosis of RMU and BIU defendants 
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6. Self-Test 


The Self-Test mode serves two puiposes. First, it establishes a checkpoint in which the nodes are 
required to exercise and assess the status of their circuitry before attempting to join other nodes on the 
bus. The effectiveness of this activity in stopping faulty nodes is dependent on the fault coverage 
provided by the self-test procedures, which are implementation-dependent. In a non-developmental 
version of the ROBUS, the results of the self-test would be included in the decision of a node to transition 
to a disabled state permanently, or to remain active and try to return to normal operation. For the current 
version of the ROBUS, a failure of the self-test is considered a local failure that should trigger a re-entry 
into the Self-Test mode. If the self-test can detect a particular permanent-fault condition affecting a node, 
the node will remain in the Self-Test mode indefinitely. 

The second puipose of the Self-Test mode is to provide a safe state to which the ROBUS nodes can go 
after detecting a failure and before attempting to re-engage. A detected failure can be caused by 
phenomena external to the bus (e.g., lightning or HIRF) that can have an unknown duration and can affect 
multiple nodes simultaneously. The nodes do not have the means to accurately determine the cause, 
duration, or number of nodes affected by a fault-causing phenomenon. Because of this, worst-case 
conditions are always assumed. The nodes are programmed to disengage from the bus as soon as a failure 
is detected and then run a self-test continuously for a time interval at least as long as the worst-case 
duration of a fault-causing phenomenon plus the worst-case failure detection delay. This behavior 
ensures that the fault-causing phenomenon has subsided and the affected nodes have disengaged from the 
bus by the time a node exits the Self-Test mode. 

Figure 6.1 shows the operations performed in the Self-Test mode. A ROBUS node enters this mode 
during a startup after the power-on enable or during a restart after the detection of a local node failure or a 
bus failure. The first action is to reset completely and disable the output. It is assumed that a fault can 
propagate to all the components within an FCR. Therefore, when a fault is detected, the state data of the 
node is considered corrupted, and none of it is carried over, irrespective of whether it is a local failure or a 
bus failure. The disabling of the output ensures that any remaining members of the clique and other 
nodes trying to rejoin the bus will consistently detect that the node is untrustworthy. The next action in 
the Self-Test mode is to execute the self-test until the bus has settled. This duration is called the Upset 
Abatement Delay. Appendix G examines the recovery process in detail and shows how to compute this 
delay. 


Power-on enable, 
Local failure, or Bus failure 



To Clique Detection mode 
Figure 6.1: Activities during the Self-Test mode 
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7. Clique Detection 


After exiting the Self-Test mode, a node needs to determine if there is a clique operating in the Clique 
Preservation mode or if a new clique needs to be formed. In this version of the ROBUS, the approach 
selected is to assume that there is clique present on the bus and try to acquire its state while 
simultaneously monitoring for any indications that the clique is not valid. Figure 7.1 shows the minor- 
mode transition graph. A recovering node is unsynchronized when it enters the Clique Detection mode. 
In the Local Diagnosis Acquisition mode, the recovering node observes the nodes of the opposite kind in 
order to determine a trusted set operating in the Clique Preservation mode. Because it is unsynchronized, 
the recovering node can only use its local time for coarse assessment of the timing characteristics of other 
nodes. In the Synchronization Acquisition mode, the recovering node synchronizes to the clique 
leveraging the accumulated diagnostic observations. Local-time synchronization enables the recovering 
node to perform refined timing observations and to gather state data from the synchronous protocols. In 
the Collective Diagnosis Acquisition mode, the recovering node gets the conviction results computed by 
the clique. The information collected up to this point is considered sufficient for the recovering node to 
make a final determination about the presence of a clique. If it has evidence that a clique is present, the 
recovering node will proceed to the Clique Join mode to attempt to become a member of the clique. 
Otherwise, it will transition to the Clique Initialization mode to form a new clique. 


From Self-Test 



To Clique Join 


Figure 7.1: Minor-mode transitions for Clique Detection mode 

The only protocol-independent condition that indicates a local failure of the recovering node is the 
assertion of an accusation against itself. Each of the following protocol-independent conditions indicates 
a clique failure: number of trusted BIUs equal to zero, and number of trusted RMUs equal to zero. 
Protocol-dependent conditions corresponding to a clique failure are described with the corresponding 
protocols. 
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7.1. Local Diagnosis Acquisition 

During Local Diagnosis Acquisition, a recovering node determines a trusted set of opposite kind nodes 
operating in the Clique Preservation mode. For this version of the ROBUS, this is accomplished by 
separate in-line monitors that check the received message patterns for each opposite kind node. Many 
combinations of timing and content checks are possible. There is a tradeoff between the effectiveness of 
the checks and their complexity. In general, a high degree of effectiveness requires a high degree of 
design refinement, which results in a more complex implementation. The chosen approach is considered 
a reasonable balance between the effectiveness and the complexity of the checks. 

The local diagnosis is performed in two phases. In the first phase, each in-line monitor searches for an 
ECHO message from its corresponding source node. Since the sources are assumed to be in Clique 
Preservation, they are expected to broadcast an ECHO message once per re-synchronization period. If a 
particular monitor does not receive an ECHO within the expected time interval, the corresponding source 
is accused. The monitors that receive the ECHO message transition to the second phase of diagnosis in 
which the content sequence for received messages is checked. An ECHO message is the last one in the 
Synchronization Preservation protocol. After that, an inline monitor expects to receive the messages for 
the Collective Diagnosis, Schedule Update, and PE Communication protocols. For Collective Diagnosis, 
the expected number of messages is constant and only DATA messages are expected. For Schedule 
Update, the expected number of messages is also constant and only PE ERROR or DATA messages are 
allowed. For PE Communication, the number of messages can vary from zero up to a known maximum 
and there are only a few valid message formats. The reception of an 1NIT message signals the arrival at 
the Synchronization Preservation protocol. The 1NIT should be followed by an ECHO message. If the 
received message pattern was not valid or the second ECHO is not received within the time of one re- 
synchronization period, the source node can be accused. 


7.2. Synchronization Acquisition 

After identifying a preliminary set of trusted nodes, the next step is to synchronize to the clique. 
Figure 7.2 shows the required activities. During Frame Synchronization, a recovering node achieves 
coarse synchronization by distinguishing received synchronization messages from different executions of 
the Synchronization Preservation protocol. During Synchronization Capture, the ECHO messages of the 
Synchronization Preservation protocol are used to tightly synchronize to the clique. 


Start 



Done 


Figure 7.2: Activities for Synchronization Acquisition 
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7.2.1. Frame Synchronization 

The puipose of Frame Synchronization is to ensure that the Accept function in Synchronization 
Capture receives the ECHO messages from trustworthy sources participating in the Synchronization 
Preservation protocol. Achieving Frame Synchronization is equivalent to finding the time gap between 
consecutive executions of the Synchronization Preservation protocol. The Frame Synchronization 
protocol for this version of the ROBUS is presented below. The protocol monitors the ECHO messages 
from the trusted nodes identified during Focal Diagnosis Acquisition. The time interval measured by the 
gap timer corresponds to the maximum observed relative skew between received ECHO messages from 
trustworthy nodes. The calculation of this skew is described in Appendix C. The protocol has provisions 
for handling untrustworthy trusted nodes. In addition, the in-line checks can be used to identify 
untrustworthy nodes. Any opposite kind node that violates the expectations can be accused. 


Process: 

1. Start the gap timer. 

2. While the gap timer has not expired: 

2.1. If an error is detected, remove the corresponding source from the eligible 
sources. 

2.1.1. Communication checks for each opposite -kind source: 

2. 1 . 1 . 1 . Expecting no link errors 

2.1.2. In-line checks for each opposite -kind source: 

2. 1.2.1. At most one RM(SPECIAL, ECHO) message expected before the gap 
timer expires. 

2.2. If an RM(SPECIAL, ECHO) message is received from an eligible source, then 
restart the gap timer. 

3. Done 


7.2.2. Synchronization Capture 

During Synchronization Capture, a recovering node synchronizes to the clique by applying the Accept 
function to ECHO messages received from trusted sources. Figure 7.3 shows the message flow graph for 
the Synchronization Preservation protocol augmented to include the P3C and P4C Synchronization 
Capture processes. Recovering RMUs execute process P3C, and recovering BIUs execute process P4C. 
A recovering BIU sends an ECHO message to its PE. The use of an ECHO message rather than an INIT 
message allows the PE to easily recognize that these are different synchronization events with different 
associated timing. 

The P3C and P4C processes are described next. The processes are triggered by the end of the Frame 
Synchronization protocol. Because the recovering nodes are only loosely synchronized, they do not have 
precise expectations about the time of arrival of the ECHO messages. Any messages received that are not 
ECHO messages must be ignored by the protocol. No reception checks are performed to determine the 
eligible voters. These are equal to the trusted nodes of the opposite kind at the start of the protocol. The 
only significant difference in the descriptions for P3C and P4C is that the BIUs in process P4C must send 
an ECHO message to their PEs. Agreement checks are performed by comparing the time between 
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relevant events against an expected duration. Appendix C provides additional information about the 
timing aspects of this protocol, including the calculation of the delays measured by the synchronization- 
reset timers in processes P3C and P4C. 



Figure 7.3: Message flow graph for the Synchronization Preservation protocol with Synchronization Capture 

processes 


Process P3C: RMUs 

1 . Receive the RM(SPECIAL, ECHO) messages from the BIUs. 

2. Compute the Accept function for the messages received from the eligible voters. 

2.1. Cross-lane checks for each BIU: 

2.1.1. Expecting agreement with the result of the Accept function 

2.2. Protocol checks: 

2.2.1. Expecting agreement among a majority of the eligible voters 

3. When the Accept output is asserted, start the synchronization-reset timer. 


Process P4C: BIUs 

1. Receive the RM(SPECIAL, ECHO) messages from the RMUs. 

2. Compute the Accept function for the messages received from the eligible voters. 

2.1. Cross-lane checks for each RMU: 

2.1.1. Expecting agreement with the result of the Accept function 

2.2. Protocol checks: 

2.2.1. Expecting agreement among a majority of the eligible voters 

3. When the Accept output is asserted, start the synchronization-reset timer and send 

RM(SPECIAL, ECHO) to the attached PE. 
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A cross-lane check error in processes P3C or P4C is sufficient evidence to accuse the corresponding 
node of the opposite kind. 

In processes P3C and P4C. it is expected to have agreement on received message timing for a majority 
of the eligible voters. An error detection by any of these checks is an indication of a clique failure. 


7.3. Collective Diagnosis Acquisition 

During Collective Diagnosis Acquisition mode, a recovering node gets the conviction results 
computed by the clique. To do this, the recovering node executes the Collective Diagnosis protocols as if 
it were part of the clique, with only one exception. For recovering BIUs executing the Collective 
Diagnosis protocol for RMU defendants, there is no expectation that the result in process P4 should equal 
the result in process P2. Similarly, for recovering RMUs executing the Collective Diagnosis protocol for 
BIU defendants, there is no expectation that the result in process P4 should equal the result in process P2. 
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8. Clique Join 


During the Clique Join mode, a recovering node demonstrates that it is suitable for admission to the 
clique. For this version of the ROBUS, that consists of operating properly for one full diagnostic cycle. 

Figure 8.1 shows the flow of operation for the Clique Join mode. When the recovering node enters 
this mode, its time and diagnostic state variables are in agreement with the corresponding state variables 
of the clique. This enables the recovering node to internally operate as if it were a member of the clique. 
The only significant difference is that its output is disabled. The node should enable its output just before 
the clique starts gathering local diagnostic observations for the next local diagnostic cycle, which occurs 
at the beginning of Collective Diagnosis. Once the output has been enabled, the recovering node should 
expect to be admitted to the clique after the following diagnostic cycle. This is confirmed by results of 
the Collective Diagnosis protocol indicating that the node is not convicted. If so, the recovering node 
transitions to the Clique Preservation mode to operate as a member of the clique. However, if the node is 
convicted, this is sufficient evidence to conclude that the node or the clique has failed. Therefore, the 
recovering node should transition to the Self-Test mode. 

Each of the following protocol-independent conditions indicates a local failure or a bus failure: 
number of trusted BIUs equal to zero, number of trusted RMUs equal to zero, and assertion of an 
accusation against self. The protocol-dependent conditions are described with the corresponding 
protocols. 
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From Clique Detection 


i 



To Clique Preservation 


Figure 8.1: Minor-mode transitions for the Clique Join mode 
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9. Clique Initialization 


A node transitions from the Clique Detection mode to the Clique Initialization mode after determining 
that there is not a valid clique operating in the Clique Preservation mode. The purpose of this major mode 
is to form a new clique. A node operates in this mode with the presumption that there is a group of nodes 
trying to form a clique and that all of them enter this mode in the unsynchronized state with a known 
bound on the relative local-time skew. To form a new clique, a set of nodes trying to form the clique 
must be identified, and local time and diagnostic state agreement must be achieved. This is done while 
simultaneously monitoring for indications that a clique cannot be formed. 

Figure 9.1 shows the main activities performed in the Clique Initialization mode. None of the time 
and diagnostic state gathered in the Clique Detection mode is valid once a node transitions to this mode. 
The first step is to clear all the state data and enable the output to allow communication with other nodes. 
In the Initial Diagnosis mode, a preliminary set of nodes trying to form a clique is found. In Initial 
Synchronization, the uncertainty in the local-time synchronization is reduced to the level expected in the 
synchronized state. In Collective Diagnosis, the nodes reach agreement on the membership of the clique. 


From Clique Detection 



To Clique Preservation 


Figure 9.1: Minor-mode transitions for the Clique Initialization mode 

Each of the following protocol-independent conditions indicates a local failure or a bus failure for the 
Clique Initialization mode: number of trusted BIUs equal to zero, number of trusted RMUs equal to zero, 
and assertion of an accusation against self. The protocol-dependent conditions are described with the 
corresponding protocols. 
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9.1. Initial Diagnosis 


In the Initial Diagnosis mode, the nodes execute a synchronous protocol to determine an initial trusted 
set taking advantage of the known bound on the synchronization precision when operating in the 
unsynchronized state. Figure 9.8 shows the message flow graph. Notice that there is no visibility to the 
behavior of nodes of the same kind. Thus, a particular node can only assess nodes of the opposite kind. 


BIUs 


RMUs 



Figure 9.8: Message flow graph for the Initial Diagnosis protocol 

The description of the protocol follows. A reception error in process PI is sufficient evidence to 
accuse the corresponding node of the opposite kind. 


Process PO: BIUs and RMUs 

1. Broadcast RM(SPECIAL, CLIQUE_INITIALIZATION). 

Process PI: BIUs and RMUs 

1 . Receive the messages from the nodes of the opposite kind. 

1.1. Communication checks for each node of the opposite kind: 

1.1.1. Expecting no link errors 

1.2. In-line checks for each node of the opposite kind: 

1.2.1. Expecting reception within a predetermined local-time interval 

1.2.2. Expecting exactly one message 

1.2.3. Expected content: RM(SPECIAL, INITIALIZATION) 


9.2. Initial Synchronization 

The purpose of the local-time synchronization protocol executed in this mode is to reduce the relative 
skew to the level required for normal synchronized operation. The protocol is triggered a fixed delay 
after the completion of the Initial Diagnosis protocol. The Initial Synchronization protocol is based on 
the same basic protocol as the Synchronization Preservation protocol, and it can handle any specified 
bound on the initial relative local-time skew, even one much larger than the final skew. 

Figure 9.2 shows the message flow graph for the protocol. The protocol is an agreement generation 
protocol using the fixed-delay model for point-to-point communication. The labels near the arrows 
indicate the type of SPECIAL message that is transmitted by the sending process. Notice that in process 
P4 the BIUs send to the PEs an ECHO message rather than an INIT message, which is used only by the 
Synchronization Preservation protocol. 
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Figure 9.2: Message flow graph for the Initial Synchronization protocol 

The description of the protocol is presented next. Because the bound on the initial relative local-time 
skew can be large, the nodes do not have precise expectations about the time of arrival of the messages. 
The processes ignore any received messages that are not of the expected kind, and no reception error 
checks are performed in any of the processes. The eligible voters are equal to the trusted nodes of the 
opposite kind at the start of the protocol. The communication and process delays for Initial 
Synchronization must be approximately equal to the ones for the Synchronization Preservation protocol in 
order for the final synchronization precision to be the same. This can result in a situation in which the 
protocol delay from PO to the synchronization reset in P3 and P4 is much smaller than the initial time 
skew. Since the Accept functions assert their outputs shortly after receiving more than half of the 
expected inputs, it is possible that the ECHO messages are broadcast before some trustworthy nodes are 
ready to send their INIT messages in processes PO and PI. In addition, since the objective of the protocol 
is to generate reference ECHO events to trigger - the enabling of the synchronization-reset timers, there is 
no need for a node to send an INIT message after it has sent an ECHO message. The blocking of INITs 
after ECHOs ensures that, in effect, the INIT-sending processes terminate after the completion of the 
ECHO-sending processes. Appendix C provides additional information about the timing aspects of this 
protocol. 


Process PO: BIUs 

1 . If it is time to send and RM(SPECLAL, ECHO) has not been broadcast, then 
broadcast RM(SPECIAL, INIT). 


Process PI: RMUs 

1 . Receive the RM(SPECIAL, INIT) messages from the BIUs. 

2. Compute the Accept function for the messages received from the eligible voters. 

3. When the Accept output is asserted, if RM(SPECIAL, ECHO) has not been 

broadcast, then broadcast RM(SPECLAL, INIT). 
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Process P2: BIUs 

1. Receive the RM(SPECIAL, INIT) messages from the RMUs. 

2. Compute the Accept function for the messages received from the eligible voters. 

3. When the Accept output is asserted, broadcast RM(SPECLAL, ECHO). 


Process P3: RMUs 

1. Receive the RM(SPECIAL, ECHO) messages from the BIUs. 

2. Compute the Accept function for the messages received from the eligible voters. 

2.1. Cross-lane checks for each BIU: 

2.1.1. Expecting agreement with the result of the Accept function 

2.2. Protocol checks: 

2.2.1. Expecting agreement among a majority of the eligible voters 

3. When the Accept output is asserted, start the synchronization-reset timer and 

broadcast RM(SPECIAL, ECHO). 


Process P4: BIUs 

1 . Receive the RM(SPECIAL, ECHO) messages from the RMUs. 

2. Compute the Accept function for the messages received from the eligible voters. 

2.1. Cross-lane checks for each RMU: 

2.1.1. Expecting agreement with the result of the Accept function 

2.2. Protocol checks: 

2.2.1. Expecting agreement among a majority of the eligible voters 

3. When the Accept output is asserted, start the synchronization-reset timer and send 

RM(SPECIAL, ECHO) to the attached PE. 


A cross-lane check error in processes P3 or P4 is sufficient evidence to accuse the corresponding node 
of the opposite kind. 

In processes P3 and P4, it is expected to have agreement on received message timing for a majority of 
the eligible voters. An error detection by any of these checks is an indication of a clique failure. 


9.3. Collective Diagnosis 

The Collective Diagnosis protocol is described in Section 5. 
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10. Concluding remarks 


ROBUS is the central feature of the SPIDER IMA architecture currently in the research and 
technology development phase. The purpose of this work is to design a flexible architecture that can be 
configured to satisfy a wide range of performance and reliability requirements. It is envisioned that 
ultimately the ROBUS will be a family of communication systems based on a common theory of fault- 
tolerant distributed computation and communication. ROBUS-2 is a concept-demonstration version of 
the ROBUS intended for laboratory experimentation and demonstrations of the capabilities. This section 
summarizes the attributes of ROBUS-2 presented in the previous sections. This is followed by an 
overview of some of the current ideas that may be explored to develop the concept. 


10.1. ROBUS-2 

ROBUS-2 provides four basic services to the PEs: message broadcast, communication schedule 
update, time reference, and self-diagnosis. The message broadcast service is realized by the PE Broadcast 
protocol, which is a Byzantine Agreement protocol that ensures result consistency even if the source PE 
or the source BIU is arbitrarily faulty. Communication schedule update uses the Schedule Update 
protocol to provide PE-fault tolerant and ensure schedule -update consistency at the RMUs, BIUs, and 
PEs. The time reference service uses the Synchronization Preservation protocol, an agreement protocol 
that generates precise periodic reference events used by RMUs, BIUs, and PEs to synchronize their local- 
time clocks. Self-diagnosis is realized by the internal ROBUS diagnostic system, which consists of local 
and collective diagnostic processing to assess the status of individual nodes and the bus as a whole. The 
Collective Diagnosis protocol is an agreement protocol that processes the local diagnostic assessments 
and establishes a consistent view of the status of every ROBUS node. The appendices present the 
supporting fault-tolerance theory for these services, including the fault conditions that guarantee the 
expected results. 

Time-triggered operation is enabled by the periodic execution of the synchronization preservation 
protocol. The known precision bound on the relative local-time skew is exploited by using distributed 
synchronous composition to coordinate distributed operations. The synchronous communication model 
solves the basic problem of transferring data between independently clocked nodes while maintaining 
overall timing predictability in the execution of distributed protocols. 

Enforcement of the scheduled bus access pattern for the PE-message broadcast service is relatively 
simple given the system topology and the use of time-triggered operation. There are three layers of 
protection against unauthorized bus access. The first line of enforcement is at the source BIUs, which are 
expected to forward the messages of their attached PEs at the scheduled time only. The second layer is 
realized by the routing function at each RMU, which is programmed to relay the messages from a 
particular PE at the scheduled time only. The final enforcement layer is at the receiving BIUs, which use 
exact-majority, dynamic word voting to filter out messages relayed by untrustworthy RMUs. 

The diagnostic system is designed based on a transient-fault model for the ROBUS nodes. All node 
faults are considered transient and, accordingly, a node is never permanently removed from the system. 
A single instance of detected misbehavior is considered sufficient evidence to remove a node from the 
trusted set. Readmission into the trusted set is allowed once a full error-free diagnostic cycle has been 
completed. This is a conservative diagnostic policy. 

Dynamic voting is the main mechanism for neutralizing undiagnosed node failures. Node trust and 
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voter eligibility are reassessed for each voting protocol operation using the latest collective diagnosis 
results as well as the local diagnostic information in order to achieve a maximum reconfiguration rate that 
is much faster than the rate of execution of the Collective Diagnosis protocol. Input-error detection, 
dynamic voting, fail-stop node behavior, and fast reconfiguration allow a clique to continue service 
delivery to the PEs for some scenarios in which a large number of nodes become untrustworthy within a 
short time interval. 

For ROBUS-2, the only difference between startup and restart is the response-triggering event. The 
node recovery procedure is independent of whether the trigger is a power-on enable, a local failure, or a 
bus failure. The Self-Test mode provides some assurance that a node is operating properly before it 
attempts to recover. In the Clique Detection mode, a recovering node attempts to acquire the state of a 
clique while monitoring for indications that one is not present. Successful acquisition of the state is 
followed by a transition to the Clique Join mode to demonstrate that it is suitable for admission into the 
clique. A recovering node in the Clique Detection mode transitions to the Clique Initialization mode if it 
detects that there is not a valid clique operating in the Clique Preservation mode. Node diagnosis in order 
to establish which nodes can be trusted is the most basic function for fault-tolerant operation in the 
ROBUS distributed system. Local Diagnosis Acquisition and Initial Diagnosis compute initial trusted 
sets for the Clique Detection and Clique Initialization modes, respectively. In addition to having a trusted 
set, the nodes must synchronize their local-time clocks before being able to communicate efficiently. The 
Synchronization Acquisition mode protocols provide the means for a recovering node to synchronize to 
an existing clique. The Initial Synchronization protocol allows recovering nodes to synchronize to form a 
new clique. With synchronized operation established, the nodes are able to use more rigorous error 
detection and diagnosis, thus strengthening the fault-tolerance of a clique. Appendix G examines the 
startup and restart capabilities. 

The ability of a ROBUS clique to maintain coordination independently of PE failures is realized by 
the PE-error checks, which report detected PE errors to the BIUs, and more importantly, by the agreement 
generation properties of the Schedule Update and PE Broadcast protocols. Appendices D and E examine 
the properties of these protocols. 


10.2. ROBUS-X 

ROBUS-1 was a proof-of-concept design meant to demonstrate some basic features of the ROBUS. 
That version, presented in [Miner 02], had a simple initialization mechanism, no failure recovery 
capability, relatively simple error detection and diagnosis, and a static communication schedule (i.e., 
loaded off-line). The scheme for collective diagnosis relied on an interactive -consistency message- 
broadcast protocol to ensure diagnosis agreement, but that resulted in a heavy performance penalty. 
Furthermore, the implementation allowed at most one PE message to be “in transit” through the bus at 
any one time. This sequential end-to-end processing limitation severely restricted the maximum message 
throughput of the bus. 

ROBUS-2 is another proof-of-concept design meant to demonstrate a combination of attributes, 
including robust fault tolerance, high message throughput, and a dynamically updateable communication 
schedule (i.e., loaded on-line). ROBUS-2 incorporates much of the theoretical insight into the operation 
of the ROBUS fault-tolerance protocols gained since the first version was designed. A hardware 
realization of the BIU and RMU nodes currently under development achieves a maximum message 
throughput of one PE message per clock tick by pipelining the processes of the PE Broadcast protocol. 
Ultimately, the throughput of the bus measured in messages per clock tick will be limited by the 
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throughput of the communication links. That realization of ROBUS-2 will be implemented on a COTS- 
based platform with the ROBUS functions programmed on field-programmable gate-arrays (FPGAs). 
Fault injection in a number of environments, including VHDL simulations, high-intensity radiated fields 
(HIRF) and neutron particle radiation, will likely be used to assess the robustness of the fault-tolerance 
and to gain additional insight for further development. 

The following ideas may be explored for further development of ROBUS. 

• For ROBUS-2, the PE bus access sequence is determined by the fixed identification numbers. This 
constraint can be easily relieved by augmenting the schedule update to include an access-sequence 
number for each PE. The PEs would access the bus in a Round Robin with respect to the dynamically 
assigned access-sequence numbers, rather than their fixed identification numbers. This is a much 
more flexible access pattern. 

• ROBUS-2 uses a simple message format: (Tag, Payload), with a payload of one word. A more 
flexible and efficient communication system may be achieved with a variable-length message format 
with one or more words per message. 

• A ROBUS-2 clique operating in the Clique Preservation mode delivers services in a fixed cyclic 
sequence. This sequence was chosen because it is a simple representative pattern that fulfills the 
basic requirement of making these services available to the PEs, and because it simplifies the state 
acquisition sequence in the Clique Detection mode. With this pattern, all the services operate at the 
same rate. For real applications, additional consideration should be given to the service timing 
requirements of the PEs, including the PE-to-PE communication requirements. A study should be 
conducted to determine the best service-delivery sequence taking into consideration the requirements 
of the ROBUS and the PEs. 

• The diagnostic system of ROBUS-2 only recognizes nodes as relevant diagnosable elements. The 
links are considered to be parts of the nodes. The result is that a node can be accused or convicted 
when messages are not propagating properly though a majority of its input or output links, even if the 
node itself is computing properly and many of its links are in good condition. One way to improve 
the fault tolerance of the ROBUS may be to increase the granularity of the diagnostic system to 
consider nodes and links, and then modifying the protocols to exploit the additional diagnostic 
information. The preferred redundancy management strategy for some applications is to use 
whatever resources are available in order to continue service delivery (i.e., “never give up”). 

• The design of the diagnostic system of ROBUS-2 is based on two principles: any evidence of 
misbehavior is sufficient to distrust a node, and one diagnostic cycle without errors is sufficient to re- 
establish trust. The main constraint for the inclusion of error checks and diagnostic rules in the 
design was to ensure compliance with the required properties of correctness and agreement on non- 
asymmetric defendants. This approach is adequate for a technology development activity meant to 
assess and demonstrate the potential capabilities of ROBUS, but the result may be a design that is 
excessively complex for real-world applications. Furthermore, such a simple diagnostic policy may 
not yield optimum reliability or the highest probability of tolerating correlated transient-fault events. 
For future versions of the ROBUS, we intend to explore the use of diagnostic policies based on the 
on-line diagnosis algorithms presented in [Walter 97], especially algorithm HD, which can be tuned 
to differentiate between permanent and transient faults. Studies on the relation between diagnostic 
policies and the resulting attributes for ROBUS (e.g., reliability, survivability, cost, etc.) would be 
useful in guiding further development activities. The SPIDER reliability study presented in 
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[Latronico 04] is a good starting point. In addition to addressing high-level attributes, studies should 
address issues like the selection of error checks and the rules for mapping detected errors to node 
trust, voter eligibility, and other diagnostic system outputs. 

The time kept by ROBUS-2 is not synchronized to external events. In essence, the Synchronization 
Preservation and Initial Synchronization protocols take as inputs a distributed event generated at the 
BIUs and compute new distributed events with higher precision bounds. If synchronization to an 
external time reference is required by the PEs, they must realize that capability independently from 
the ROBUS time service. 

The ROBUS synchronization protocols can be modified as follows to synchronize the bus and the 
PEs to an external event: the external event is read by the PEs, which generate messages that are sent 
to the BIUs using fixed-delay communication; once the BIUs receive these messages, they generate 
INIT messages just like for the current protocols; the remaining operations of the protocols remain as 
they are. Having the bus synchronized to an important external event enables timely interaction 
between the PEs and the external world. 

The number of point-to-point communication links for the ROBUS topology increases with the 
number BIUs and RMUs. For a system with N BIUs and M RMUs, the total number of one- 
directional links for BIU-RMU communication is 2NM. Including the PE-BIU links, the total 
number of one-directional links grows to 2N(M + 1). If the links are implemented using bidirectional 
communication cables, the required number of cables can be reduced to NM for the BIU-RMU 
communication links, or N(M + 1) including PE-BIU links. The wiring for BIU-RMU 
communications can be reduced by using one-to-many broadcast links driven by a single transmitter 
at each BIU and each RMU. In that case, the number of cables can be reduced to N+M for the BIU- 
RMU communication links, or 2N + M overall. A study should be conducted to determine the 
advantages and disadvantages of various wiring options. 

The total amount of cabling required (in terms of total aggregate length) can be reduced by using the 
configuration depicted in Figure 10.1. In this configuration, BIUs and RMUs are in close proximity 
to one another forming what is essentially a fault-tolerant hub. The PEs may be placed in widely 
distributed locations. The total number of individual communication paths remains unchanged, but 
the amount of cabling can be significantly less than for a configuration in which the BUIs and RMUs 
are farther away from one another. 


Fault-tolerant 

hub 



Figure 10.1: ROBUS in a fault-tolerant hub configuration 




• The configuration shown in Figure 10.1 emphasizes of the PE-BIU links. One idea to develop the 
ROBUS is to explore the design of PE-BIU interfaces based on common commercial communication 
links (e.g. IEEE- 1394 Firewire). Additional logic would have to be added at the PE and BIU ends to 
complement such links in order to realize all the functionality required for operation in the ROBUS 
(e.g., message format translation, clock synchronization, error detection, and BIU interfacing). 

• ROBUS-2 was designed to operate with at most one active clique on the bus at any given time. No 
assurance of any kind is given about the behavior of the system if this condition is violated. It is 
known that if there are two or more mutually exclusive cliques simultaneously active on the bus, and 
the nodes in one clique distrust the nodes in the other cliques, then the system can remain in that state 
indefinitely. For some applications it is important to have assurance that such a multi-clique 
condition is impossible or extremely improbable. Under relatively benign conditions such that the 
BUIs and RMUs fail mainly because of physical degradation or random localized transient faults, an 
existing clique should be able to remain active and attract recovering nodes. The probability that a 
new clique does not form, or that multiple ones do, is likely to be higher when the bus is exposed to 
harsh conditions that can overwhelm the fault tolerance of a clique. For some applications, avoidance 
of harsh conditions is a practical way to ensure that the system remains safely within its fault 
tolerance limits. For other applications, the ROBUS must be capable of reliably re-establishing a 
single clique. If the assumptions of the Clique Initialization mode (see Section 9 and Appendices C 
and G) are guaranteed to be satisfied, the restart approach in ROBUS-2 is adequate. Otherwise, some 
other means to enable the ROBUS to return to normal operation, preferably on its own, will be 
necessary. At this time we do not have protocols that allow the ROBUS to return to normal operation 
from an arbitrary state (i.e., self-stabilize). 

• ROBUS-2 nodes of a particular kind (i.e., BIU or RMU) are differentiated only by their assigned 
identification numbers. In every other respect, the nodes are considered to be part of a single uniform 
group. One way to improve the design is to divide the nodes into a core group and a client group. 
The core group would be composed of all the RMUs and a subset of the BIUs. The core nodes and 
their corresponding PEs would be responsible for computing the basic bus functions like 
synchronization, collective diagnosis, and communication schedule update. The computed results for 
these functions would be broadcast to the client BIUs, whose main tasks would be to provide access 
to the bus to additional PEs. The client BIUs would also have to gather diagnostic information and 
send it to the core group for the collective diagnosis. This allocation of functions would reduce the 
complexity of the client BIUs, and possibly the RMUs as well. [Kopetz 87] proposed a similar 
approach to implement a clock synchronization service in generic distributed real-time systems. 

• The ROBUS enables the development of several fault-tolerance strategies combining simplex PE 
nodes. Figure 10.2 presents a sample configuration with three PEs in a triple modular redundant 
(TMR) configuration (labeled vPE 0, or virtual Processing Element 0), four processors in a dual-dual 
configuration (vPE 2), and a single simplex processor (vPE 1). For ROBUS-2, the bus interacts with 
the PEs as if they were part of a single uniform group, and the distributed SPIDER operating system 
manages the PE configuration. Note that ROBUS-2 supports having only a subset of the PEs 
compute the communication schedule. Since PE_ERROR messages are not eligible inputs to the 
Schedule Update protocol, the PEs that are not going to send a schedule update to the bus can remove 
themselves from consideration by signaling an error to their BIUs, which would then send 
PE_ERROR messages. In addition, note that the design of ROBUS-2 is compatible with dynamic 
reconfiguration strategies at the PE level, in which the processors get reassigned to particular virtual 
PEs in real time as operating modes change or failures occur. The bus is completely unaware of such 
rearrangements. 
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Future concept-development efforts could explore moving some PE-configuration support functions 
to the bus. For example, consider a system in which PE-communication scheduling is done with 
respect to the virtual PEs. For the configuration in Figure 10.2 during the PE Communication mode, 
the ROBUS could vote on the fly messages simultaneously sent by the PEs in vPE 0 and then 
broadcast the results. For vPE 2, the ROBUS could monitor messages simultaneously sent by dual 
PEs 2 and 7 and broadcast the messages from one of them so long as a discrepancy is not detected 
between the two input streams. If a discrepancy were detected during the transmission, the ROBUS 
would immediately switch to dual PEs 4 and 5. Note that the updating of the communication 
schedule could also take advantage of these features by using one predetermined virtual PE to 
compute the schedule and then having the ROBUS vote only on the inputs from the corresponding 
PEs. A system with such features would have a faster response to PE faults and reduced 
communication bandwidth requirement. In order to support these configuration-dependent functions, 
the ROBUS would have to have information about how the PEs are configured. A protocol similar to 
the Schedule Update protocol can be used to download such information from the PEs to the ROBUS. 


vPE 0 vPE 1 vPE 2 



Figure 10.2: Sample PE configuration 

• Figure 10.3 illustrates another concept for configuring the PEs. Here the PEs and their BIUs are 
divided into groups based on common attributes (e.g., implementing a particular function). Each 
group is composed of one or more PEs. The RMUs support independent broadcast communication 
within each group, as well as inter-group communication. Thus, the RMUs implement multicast 
communication with one or more simultaneous input message streams and each stream relayed to one 
or more groups. This configuration can be visualized as being the result of merging multiple basic 
ROBUS systems with corresponding RMUs linked by a routing function. Such a combined system is 
more capable than a single ROBUS-2 system, but it is also more complex. The time reference service 
can be easily implemented using one group as a core and the rest as clients that are synchronized with 
respect to the core. The scheduling of messages would have to take into consideration inter-group 
messages. Diagnosis, both local and collective, would be modified to use policies that exploit 
knowledge of how the BIUs and PEs are grouped. 

Other areas that may be explored for further development include ways to explicitly support event- 
triggered messages with strict end-to-end latency constraints, and efficient ways of implementing the 
ROBUS system in hardware and/or software. Future versions of ROBUS, generically designated as 
ROBUS-X, will likely have more refined designs geared toward practical applications with strict cost and 
complexity constraints. 
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Figure 10.3: PE groups with multicast ROBUS communication 
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Appendix A. ROBUS fault-tolerance fundamentals 


This appendix describes the basic theory of distributed computation applied to the design of the 
ROBUS protocols for the delivery of fault-tolerant services to the PEs. 

The presentation in the following three subsections uses concepts adapted from the material in 
[Avizienis 04], [Laprie 95], and [Suri 95]. 


A.l. Faults, errors, and failures 

In general, a system has a specification and is composed of multiple components or sub-systems. 
Likewise, each of these sub-systems has a specification and an internal structure of interconnected 
components. In what follows, it is assumed that the specifications at every hierarchical level are correct 
and free from ambiguity, omission, and other kinds of defects. 

The terms fault, error, and failure are used to describe a cause-and-effect relationship between 
undesired circumstances in the context of the hierarchical composition of a system. A failure occurs 
when the behavior of a system fails to provide the desired service. Failure is assessed at the external 
interface of a system and is determined by deviations from the behavior expected according to the 
specification. An error is a deviation from the intended value and/or timing of data (including signals 
and state variables) at a particular hierarchical level. A fault is a defect in a system component. 

The fault, error, and failure terms facilitate the structured analysis of the failure characteristics of a 
system and the determination of failure causality chains from low-level components to higher-level 
components. In a simple chain, the failure of a system is due to the presence of an error in the system. 
This error is caused by a faulty component that failed to deliver the expected service. At this point, the 
component can be seen as a system and the failure causality chain expanded by further exploring the 
hierarchical structure. The chain ends when a component is reached beyond which no internal structure 
can be discerned or is of interest. 


A.2. Fault characteristics 

The following fault classification criteria were considered during the design process of the ROBUS. 


A.2.1. Cause 

The causes of faults can be divided into two main categories: design faults and physical faults. Design 
faults are specification or implementation mistakes such that the system does not function as desired. 
The fault-tolerance capabilities of ROBUS are not meant to handle design faults. Instead, formal analysis 
and other design verification activities are used to minimize their introduction (for example, see [Geser 
04] and [Pike 04]). Physical faults are caused by internal defects and external disturbances. Examples of 
internal defects include manufacturing imperfections, component wear-out, internal electromagnetic 
interference (EMI) (e.g., cross-talk and ground bounce), and radioactive impurities in semiconductor 
parts. External disturbances that can cause faults include particle radiation, external EMI (e.g., lightning, 
high-intensity radiated fields, and electrostatic discharge), input power fluctuations, and environmental 
extremes (e.g., temperature, vibration, and shock). Part of the design process of a system is to assess and 
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control the time rate of occurrence of physical faults. Examples of ways to manage the physical fault 
rates include component selection, shielding, environmental qualification testing, reliability testing, 
control of the operational environment, and preventive maintenance. 


A.2.2. Correlation and extent 

Physical faults on two separate components are independent if there is no causal or common cause 
relation between them. Otherwise, the faults are correlated. In real systems, faults can propagate from 
one component to another and they can be caused by the same instance of a particular phenomenon, 
especially when exposed to external disturbances that can reach multiple components. Thus, complete 
independence between faults occurring at separate components is not entirely possible. The extent of a 
fault refers to the number of affected components. For the ROBUS topology, this can range from one 
node to every node on the bus. Note that the PEs can be affected by the same phenomena as the BIUs and 
RMUs. The ROBUS is designed to handle scenarios involving a large number of nodes simultaneously 
becoming faulty. 


A.2.3. Activity 

A component is said to have experienced a fault when its behavior deviates from its specification. A 
fault is said to be latent or dormant if it has not affected the externally observed behavior of the 
component. Once an error occurs at the interface of the component, the fault is said to be active. 
Similarly, an error is latent when it has not propagated to the system interface, and it becomes active once 
it causes a failure. This concept is used when considering the effects of faults at the interface of 
individual BIU, RMU, and PE nodes, as well as at the interface of the ROBUS viewed as a unit. 


A.2.4. Duration 

Permanent faults appear and remain present in a system until they are removed through a 
maintenance action. Transient faults appear for a limited duration of time. Depending on the structure 
and behavior of a system, both permanent and transient faults can cause errors and failures of permanent 
or transient duration. The fault type of main interest for this developmental version is the single-event 
transient that is active for a bounded duration of time. Nevertheless, the redundancy management system 
of the ROBUS can handle many forms of permanent and transient faults. 


A.2.5. Consistency of perception 

Consistency of perception refers to the degree to which observations of the fault manifestations differ 
among the direct observers. A fault is symmetric if all properly working observers receive agreeing 
inputs. Otherwise, the fault is called asymmetric. Consistency of perception is an important fault 
characteristic for the ROBUS since asymmetric manifestations can threaten the integrity of a clique by 
causing divergence in distributed computation results. The ROBUS protocols are designed to handle a 
bounded number of simultaneously active asymmetric faulty nodes. 
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A.2.6. In-line detectability 


If a direct observer can detect input errors caused by a fault using independent in-line observations 
(i.e., without comparison against data from redundant sources), then the observer can take appropriate 
actions to prevent the propagation of the errors to the computation results. Input error detection by means 
of communication and in-line checks enables the ROBUS nodes to identify and remove erroneous data 
from consideration in local protocol operations. In general, an increase in the percentage of in-line 
detectable faults results in a reduction of the ROBUS failure probability. 


A.2.7. Diagnosability 

In the context of the ROBUS, a fault is diagnosable if the properly working observers of the affected 
node can identify the node as a source of errors. The diagnosability of a fault is limited by the error 
detection and diagnosis capabilities of the observers, as well as by the message flow patterns and types of 
operations required by the distributed protocols. The ROBUS nodes handle diagnosable faults by 
removing offending nodes from the trusted set. In general, an increase in the percentage of diagnosable 
faults results in a reduction of the ROBUS failure probability. 


A.3. Fault and error containment 

The most basic element of the fault-tolerance strategy of the ROBUS is the fault containment region 
(FCR). The purpose of the FCR is to ensure a high degree of independence between physical faults 
occurring in different system components. Each BIU and RMU must be in a separate FCR. The building 
of FCRs requires careful consideration of the physical characteristics of a system to ensure that a proper 
degree of containment is present for all coupling paths. Examples of ways to achieve fault independence 
include the use of independent power supplies for each FCR, independent cooling systems, separate 
electromagnetic shielding for each FCR, fiber-optic data links, and physical distancing of the FCRs. In 
practical terms, it is impossible to achieve complete fault independence for all possible faults. A special 
concern is faults caused by external EMI, like lightning and high intensity electromagnetic fields. This 
kind of phenomenon has the potential to engulf a system causing simultaneous faults in multiple FCRs. 
Shielding and other techniques can be used to minimize the threat to a system, but total elimination of the 
threat is not always possible. The goal in real systems is to achieve a degree of fault independence 
between FCRs that is acceptable for the application. 

Error containment is the second layer of the fault-tolerance strategy of the ROBUS. The error 
containment mechanisms of the ROBUS are aimed at preventing the propagation of errors between FCRs. 
The ultimate goal of the redundancy management strategy of the ROBUS is to prevent the propagation of 
errors to the external interfaces of the bus at the BIUs. Given that each PE is attached to a single BIU, 
which could be faulty, the bus is considered to have experienced a failure only when errors reach the 
interface of a fault-free BIU that is a clique member. The ROBUS performs an internal failure 
assessment by monitoring its own activity searching for violations to the conditions that ensure the 
effective performance of the error containment mechanisms. Such a violation is an indication that error 
containment cannot be guaranteed. 
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A.4. Node health and inclusion status 


Due to the diagnostic and recovery capabilities of the ROBUS, the analysis must take into 
consideration more than just the fault status of the nodes. A node can be operating according to its 
specification, yet, if it is recovering, it may not be ready to deliver the services expected by a clique. In 
addition, even if a node can be relied upon, the clique is not able to integrate it immediately because trust 
is only asserted at the boundaries of collective diagnostic intervals. On the other hand, a clique may not 
have enough evidence to remove a particular untrustworthy node whose behavior is not covered by the 
error detection and diagnosis capabilities of the system. 

The following criteria determine the health and inclusion status of a node. 

• Goodness: A node is good if it behaves according to its specification. Otherwise, the node is bad or 

faulty. 

• Trustworthiness: A node is trustworthy if it is suitable to participate in the delivery of services to 
the PEs. Otherwise, the node is untrustworthy. A trustworthy node must be good and its state must 
be in agreement with the state of other good clique members. 

• Diagnostic status: A node is trusted if it is a clique member, and thus allowed to participate in the 
execution of distributed operations. Otherwise, the node is distrusted. 


A.5. Fault model 

For the analysis of the ROBUS, a distributed system in which a group of nodes collaborate to achieve 
common goals, it is more useful to know how the behavior of a node is perceived at the receivers (or 
direct observers) than what actually occurs at the output of the node itself. In this section, first, the 
behavior of a node is classified according to its manifestations at the direct observers for a given 
transmission, and then this classification is leveraged to define a general node fault model. This model is 
a modified version of the hybrid fault-effect model presented by Thambidurai and Park in [Thambidurai 
88 ], 

A.5.1. Instantaneous behavioral manifestations 

For this model, the classification of the behavior of a node applies to one message transmission, 
expected or unexpected, from the node to its trustworthy direct observers. As used here, the validity of 
the behavior of a node depends on the specific activity (e.g., process P2 of the Collective Diagnosis 
protocol in the Clique Preservation mode) being carried out by the trustworthy direct observers, which are 
either clique members or, in the case of recovering observers, are in state agreement with a clique. A 
transmission (or a non-transmission, if one is not expected, according to the specification) from a given 
node is valid if it is functionally equivalent to the behavior expected from a trustworthy node of the same 
kind. Both the timing and the content characteristics of a transmission are important determinants of the 
perceived behavior at the direct observers. 

The following categories are mutually exclusive and, taken together, cover all possible behavioral 
manifestations. 
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• Valid: The behavior of the node perceived by the trustworthy direct observers is in accordance with 
the specification. 

• Manifest: The node does not behave as expected by its trustworthy direct observers and this is 
detected by all of them using their input error detectors. 

• Symmetric: The node does not behave as expected by its trustworthy direct observers, which receive 
consistent invalid inputs from the node, but fail to detect the misbehavior of the node using their input 
error detectors. 

• Asymmetric: The node does not behave as expected by its trustworthy direct observers, which 
receive inconsistent inputs from the node, and some or all of them fail to detect the misbehavior of the 
node using their input error detectors. 

The classification of a given physical fault is dependent on the particular input error detectors used by 
the trustworthy direct observers at the time of the message. For example, if there are no input error 
detectors active on the direct observers, then a fault may be classified as symmetric or asymmetric, but 
not manifest. Likewise, a fault that is classified as symmetric or asymmetric for one set of active error 
detectors may be classified as manifest for a different set of detectors. 

In addition to the error detectors, the membership of the set of trustworthy direct observers is a critical 
determinant of the classification of a fault. For example, a fault is classified as asymmetric if only one 
trustworthy direct observer receives an undetected inconsistent input, but it would be classified as either 
manifest or symmetric if that particular observer became untrustworthy and everything else remained the 
same. 

There is not a one-to-one relation between symmetry at the transmitter (i.e., consistent generation of a 
message) and consistency at the observers (i.e., consistent reception of a message). For example, it is also 
possible that the trustworthy direct observers receive the same message but disagree on the result of input 
error detection. This can happen if there is a timing violation in the transmission of the message such that 
not all of the observers determine that the received message arrived within the expected time interval. 
Note that an omissive node failure in which a node does not transmit an expected message always has 
symmetric manifestations. This is one of the reasons for designing the ROBUS nodes to disable their 
outputs upon detection of a failure, rather than sending some sort of “I_AM_FAULTY” message that 
could have asymmetric manifestations. 


A.5.2. Node fault model 

The node fault model separates the nodes into two main categories: trustworthy and untrustworthy. A 
node is untrustworthy if it is faulty or its state disagrees with the state of the trustworthy nodes. The 
externally perceived behavior of an untrustworthy node is dependent on factors like the characteristics of 
the fault affecting the node, the internal design of the node, the specific activity being carried out by the 
clique, and the specific input error detectors used by its trustworthy direct observers. The fault model 
avoids these complications by defining behavioral categories based on sets of instantaneous behavioral 
manifestations and allowing the behavior of the nodes to vary within the range of the corresponding set. 
This approach simplifies the abstract analysis of the bus, from which guidelines are then derived for the 
design of the nodes. The categories for the node fault model are the following. 
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• Trustworthy: The node is good and it can be counted upon to correctly deliver the expected services. 
The node exhibits only valid instantaneous behavioral manifestations. 

• Benign: The node is untrustworthy with valid or manifest instantaneous behavioral manifestations. 

• Symmetric: The node is untrustworthy with valid, manifest, or symmetric instantaneous behavioral 
manifestations. 

• Asymmetric: The node is untrustworthy with valid, manifest, symmetric, or asymmetric 
instantaneous behavioral manifestations. 

In addition, the timing of transitions between trustworthy and untrustworthy are modeled as follows. 

• A node can transition from trustworthy to untrustworthy at any time. 

• A node can transition from untrustworthy to trustworthy only at boundaries of the collective 
diagnostic intervals. 


A.6. Basic design of the ROBUS protocols 

The ROBUS protocols perform diverse functions. However, most of the protocols are based on the 
unified fault-tolerance protocol presented in [Miner 04] . This section describes the basic concepts used to 
design the protocols. 

The ROBUS protocols are composed of one or two processing phases, each one having of two 
computation stages. A stage refers to the transmission of a message from one or more nodes of a 
particular kind to nodes of the opposite kind, and it involves the Send Process of the source nodes, their 
broadcast transmission links, and the Receive and Computation Processes at the nodes of the opposite 
kind. A phase is a complete message flow cycle in which messages are sent by a particular set of nodes, 
processed by nodes of the opposite kind, and then the results are returned to the first nodes for additional 
processing. Figure A.l illustrates these concepts. The nodes are labeled simply as LEFT and RIGHT 
since, due to the symmetries of the ROBUS topology and of the basic protocol design concepts, it is not 
significant in the basic theory to know which are BIUs and which are RMUs. (Stages 2 and 3 form a 
third phase not shown in Figure A.l.) 

LEFT 


RIGHT 






Stage 1 

1 

Stage 2 



Stage 3 

Stage 4 



^ Phase 1 

Phase 2 ^ 



Figure A. 1 : Illustration of protocol stages and phases 
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A.6.1. Properties of protocol stages 


Individual stage operations are the building blocks of the ROBUS protocols. In general, a stage 
consists of the transmission of data from one or more source nodes to receiving nodes of the opposite 
kind, followed by the application of a voting function to reduce the received data to a single value, which 
is the result of the stage operation. The ROBUS nodes use dynamic voting, in which only a selected 
group of inputs is considered in the voting operation. The sources whose inputs are allowed to participate 
in the vote are called the eligible voters. The eligibility conditions for a particular stage operation depend 
on the purpose of the protocol, the operation performed by the stage, and the position of the stage in the 
protocol. 

The fundamental theory of operation of the ROBUS protocols is based on the middle-value-select 
voting function. Let E denote the number of eligible voters. For this version of the ROBUS, the middle 
value is given by the input in the |~(E + l)/ 2 ~|-th position when the eligible input values are arranged in 
order from minimum to maximum. 

Two models of communication can be used to transmit data: exact and inexact. Let v s denote the 
value transmitted by a trustworthy source, and let v r denote the value received by a trustworthy receiver. 
In the exact communication model, each of the trustworthy receiving nodes receives the same value 
transmitted by a trustworthy source (i.e., v r = v s ). In the inexact communication model, the transmission 
introduces uncertainty in the values received by the trustworthy nodes. The trustworthy receiving nodes 
do not necessarily get the same value transmitted by a trustworthy source, but the received value at each 
receiver satisfies the following constraint: v s - £1 < v r < v s + £ h , where £] and £ h denote the low-side and 
high-side error bounds (i.e., the error bounds in the negative and positive directions), respectively. The 
bound on the total communication imprecision, denoted by e, is £1 + £ h . 

The behavior of untrustworthy sources must be taken into consideration to determine the results of 
stage operations. Only eligible voters can influence the voting results. A transmission from an 
untrustworthy eligible voter can have symmetric or asymmetric manifestations at the trustworthy 
receiving nodes of the opposite kind. The meaning of symmetric and asymmetric manifestations depend 
on the model of communication. For exact communication, a symmetric manifestation means that all of 
the trustworthy receivers get exactly the same arbitrary value. For inexact communication, a symmetric 
manifestation means that the trustworthy receivers get arbitrary values that differ from one another by at 
most e. An asymmetric manifestation for exact and inexact communication simply means that there are 
no relational constraints for the values received by trustworthy receivers. For some voter eligibility 
conditions, it is possible that the receivers do not agree on the eligibility of asymmetric sources. 

The following subsections present some important properties of stage operations for the exact and 
inexact communication models. 


A.6.1. 1. Voting with exact communication 

Let v smm and v s max denote the bounds for the values transmitted by the trustworthy sources, and let A 
denote the bound on the range of the values transmitted by the trustworthy sources: A = v s max - v s min . v r i 
denotes the voting result at receiving node i. 

EV; denotes the set of eligible voters at receiving node i. This set may be composed of trustworthy 
and untrustworthy sources. Twy_EVi, Sym_EVi, and Asym_EVi denote the sets of trustworthy, 
symmetric untrustworthy, and asymmetric untrustworthy eligible voters at node i, respectively. It is 
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assumed that all of the trustworthy sources are eligible to vote at each trustworthy receiving node. In 
addition, it is assumed that the trustworthy receiving nodes agree on the eligibility of trustworthy and 
symmetric untrustworthy sources, but they may not agree on the eligibility of asymmetric sources. For 
the properties presented next, it is also assumed that the set of eligible voters at each trustworthy receiving 
node contains more trustworthy sources than untrustworthy ones. That is: 

ITwy_EVil > ISym_EVil + IAsym_EVil, 

where receiving node i is trustworthy, and 1*1 denotes the set cardinality function. 

Validity: At each trustworthy receiver, the result of the voting function is in the interval [ v s>m in, v s ,max]- 

Proof: Conceptually, the middle-value-select voter at receiver i selects the value in the T (IEVJ + 1)/2~|- 
th position from a list of eligible input values arranged in order from minimum to maximum. The 
assumption that at each trustworthy receiver the set of eligible voters contains more trustworthy sources 
than untrustworthy ones implies that at least T (IEVJ + 1)/2~| eligible input values (i.e., a majority) are in 
the interval [v smin , v smax l . Thus, at each trustworthy receiver there are fewer than T (IEV,I + 1)/2~| 
untrustworthy eligible voters. Although the values from untrustworthy eligible voters can be arbitrary, 
even if all of their values were less than v s min , there are not enough of them to cause the selection of a 
value smaller than v s niin . Likewise, even if all of their values were larger than v s max , the selected value 
would be at most v s , max . In general, the selection is a value from a trustworthy source, or the value from 
an untrustworthy source in the interval [v s min , v s>max ]. 

Agreement propagation: The results of the middle-value-select voting functions at the trustworthy 
receivers differ from one another by at most A. 

Proof: The validity property shows that the values selected at the trustworthy receivers are in the 
interval [v s min , v s>max ], which corresponds to a total range of A. The actual agreement range can be smaller 
than this depending on the values received from untrustworthy eligible voters. 

Agreement generation: If the sets of eligible voters at the trustworthy receiving nodes do not include 
asymmetric untrustworthy sources (i.e., IAsym_EVil = 0 for each trustworthy receiving node i), then the 
voting results at the trustworthy receivers exactly agree. 

Proof: If the sets of eligible voters at the trustworthy receivers do not include asymmetric sources, 
then the property of agreement for non-asymmetric defendants ensures that all the sets are identical. 
Therefore, the trustworthy receivers vote on the same set of values. Thus, the voting results will be the 
same. 


A.6.1.2. Voting with inexact communication 

For voting with inexact communication, the same assumptions are made as for voting with exact 
communication. The only significant difference here is the imprecision in the received values. The 
following properties are satisfied. 

Validity: At each trustworthy receiver, the result of the voting function is in the interval [v s min - £|, 

Vs.max “t" £h] - 

Proof: Since the minimum value transmitted by a trustworthy source is v smin , the minimum value 
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received by the trustworthy receivers from a trustworthy source is v s , min - £ h Likewise, the maximum 
value transmitted is v smax , and the maximum value received from a trustworthy source is v s>max + £ h . 
Similarly to the case of voting with exact communication, the middle -value-select voter at receiver i 
selects the value in the T (lEVil + l)/2~|-th position from a list of eligible input values arranged in order 
from minimum to maximum. In addition, at least T (lEVil + 1)/2~| eligible input values (i.e., a majority) are 
in the interval | v S ITlin - e ls v smax + £ h ], and there are fewer than [ (IEVJ + l)/2] untrustworthy eligible voters. 
Although the values from untrustworthy eligible voters can be arbitrary, even if all of their values were 
less than v s . mm - £ 1 , there are not enough of them to cause the selection of a value smaller than v Sjm i„ - £ 1 . 
Likewise, even if all of their values were larger than v s , max + £ h , the selected value would be at most v s max 
+ £h. In general, the selection is a value from a trustworthy source, or the value from an untrustworthy 
source in the interval | v vrnm - £,, v s>max + £ h ]. 

Agreement propagation: The results of the middle-value-select voting functions at the trustworthy 
receivers differ from one another by at most A + e. 

Proof: The validity property shows that the values selected at the trustworthy receivers are in the 
interval [v smin - £ h v smax + Ej, which corresponds to a total range of: (v smax + £ h ) - (v s , min - £ 1 ) = A + e. 
The actual agreement range can be smaller than this depending on the values received from untrustworthy 
eligible voters. 

Agreement generation: If the sets of eligible voters at the trustworthy receiving nodes do not include 
asymmetric untrustworthy sources (i.e., IAsym_EVil = 0 for each trustworthy receiving node i), then the 
voting results at the trustworthy receivers agree within e. 

Proof: If the sets of eligible voters at the trustworthy receivers do not include asymmetric sources, 
then all the sets are identical. In addition, the inexact communication model ensures that, for each eligible 
voter, any two trustworthy receivers receive values that differ by at most e. Let EV denote the set of 
eligible voters. The voters at the trustworthy receivers select the value in the T (IEVI + l)/2]-th position 
from a list of eligible input values arranged in order from minimum to maximum. Let node x be the 
trustworthy receiver that has the result with the smallest value, denoted by v x . Thus, node x received 
values smaller than or equal to v x from at least T (IEVI + l)/2] eligible sources. The corresponding 
receptions at the other trustworthy receivers can have a maximum value of at most v x + e. Therefore, the 
voting results at those trustworthy receivers will be smaller than or equal to v x + e, but not smaller than v x . 
Thus, the voting results at the trustworthy receivers will agree within e. 


A.6.2. Properties of protocol phases 

A protocol phase is composed of two consecutive protocol stages in which the results of the first stage 
determine the inputs of the second stage. For each phase in Figure A. 1 , the LEFT nodes are the initial 
data sources and the final receivers, and the RIGHT nodes are the intermediate receivers and sources. 
Phases 1 and 2 are called the agreement generation phase and the agreement propagation phase, 
respectively. The processes for the agreement generation phase are labeled PO, PI, and P2. For the 
agreement propagation phase, the processes are P2, P3, and P4. In what follows, we refer to processes 
PI, P2, P3, and P4 in Figure A.l as receiving processes, in order to differentiate them from process PO, 
which does not involve the reception of messages. 

Consider the agreement generation phase. The LEFT nodes execute processes PO and P2, and the 
RIGHT nodes execute process Pl. Let EV PU and EYp 2 j denote the set of eligible voters in process PI at 
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RIGHT node i and in process P2 at LEFT node j, respectively. It is assumed that the set of eligible voters 
at each trustworthy receiver of a particular kind includes all the trustworthy sources of the opposite kind. 
In addition, it is assumed that the trustworthy receiving nodes of a particular kind agree on the eligibility 
of trustworthy and symmetric untrustworthy sources of the opposite kind, but they may not agree on the 
eligibility of asymmetric sources. For the properties presented next, it is also assumed that the set of 
eligible voters at each trustworthy receiving node contains more trustworthy sources than untrustworthy 
ones. For process PI at the RIGHT nodes: 


ITwy_EV Plji l > ISym_EV PU l + IAsym_EV PU l, 


for trustworthy receiving RIGHT node i. For process P2 at the LEFT nodes: 


ITwy_EV P2 jl > ISym_EV P2 ,jl + IAsym_EV P2 jl, 


for trustworthy receiving LEFT node j. Similar assumptions are made for processes P3 and P4 of the 
agreement propagation phase. 

In addition, it is assumed that at every trustworthy LEFT receiver or at every trustworthy RIGHT 
receiver there are no asymmetric eligible voters for any of the corresponding receiving processes. That is: 

IAsym_EV PI ; l = 0 and IAsym_FV P3 l l = 0 for each trustworthy RIGHT receiver i, or 

IAsym_EV P2 ,jl = 0 and IAsym_EV P4 ,jl = 0 for each trustworthy LEFT receiver j. 

The following subsections present some important properties of phase operations for the exact and 
inexact communication models. 


A.6.2.1. Agreement generation phase 

Let v P0 ,min and v P o ima x denote the bounds for the values transmitted by the trustworthy LEFT nodes for 
process PO. The value transmitted by process PI is equal to the result of its vote. Thus, v P[mm and v Pl max 
denote the bounds for the voting results and the transmitted values for process PI at the trustworthy 
RIGHT nodes. v P2 mm and v P2 max denote the bounds for the voting result for process P2 at the trustworthy 
LEFT nodes. A P0 , A P[ , and A P2 denote the bounds on the range of the voting results at the trustworthy 
nodes for processes PO, PI, and P2, respectively. 

Operations with the exact communication model are considered first, followed by operations with the 
inexact communication model. 


A.6.2.1. 1. Voting with exact communication 

The following properties hold for an agreement generation phase with exact communication. 

Validity: At each trustworthy LEFT node, the result of the vote in process P2 is in the interval |v POnim , 

Vpo.max]- 

Proof: Based on the validity property for a stage operation with exact communication, the voting 
results for process PI at trustworthy RIGHT nodes are in the interval [v P o,mi n , v P0 , max ]. The validity 
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property applied to the second stage ensures that the voting results for process P2 at trustworthy LEFT 
nodes are also in the interval [v P0 , ni i n , v P0 , max ]. 

Agreement generation: The voting results for process P2 at the trustworthy LEFT nodes exactly 
agree (i.e., A P2 = 0). 


Proof: Two cases must be considered. 

Case 1: IAsym_EV P i,jl = 0: According to the agreement generation property for a stage operation with 
exact communication, the voting results for process Pi at trustworthy RIGHT nodes will exactly agree 
(i.e., A P1 = 0). The agreement propagation property ensures that the range is preserved by the second 
stage. Therefore, the voting results exactly agree. 

Case 2: IAsym_EV P2> jl = 0: Based on the agreement propagation property for a stage operation, the 
voting results for process PI at trustworthy RIGHT nodes agree within A P0 (i.e., A P[ = A P0 ). The 
agreement generation property for a stage operation ensures that the voting results for process P2 at 
trustworthy LEFT nodes exactly agree. 


A.6.2.1.2. Voting with inexact communication 

The following properties hold for an agreement generation phase with inexact communication. 

Validity: At each trustworthy LEFT node, the result of the vote in process P2 is in the interval [v P o,mi n 
- 2£|, v P o ,max + 2eJ. 

Proof: Based on the validity property for a stage operation with inexact communication, the voting 
results for process PI at trustworthy RIGHT nodes are in the interval | v FOinin - G, v P0 , max + £(,]• The 
validity property applied to the second stage ensures that the voting results for process P2 at trustworthy 
LEFT nodes are in the interval [ v FOmm - 2£ h v POjinax + 2e h ] . 

Agreement generation: The voting results for process P2 at the trustworthy LEFT nodes agree within 
2e (i.e., A P2 = 2e). 


Proof: Two cases must be considered. 

Case 1: IAsym_EV P i,jl = 0: According to the agreement generation property for a stage operation with 
inexact communication, the voting results for process PI at trustworthy RIGHT nodes will agree within e. 
The agreement propagation property ensures that the range of values will increase by at most e in the 
second stage. Therefore, the voting results for process P2 at the trustworthy LEFT nodes agree within 2e. 

Case 2: IAsym_EV P2> jl = 0: Based on the agreement propagation property for a stage operation, the 
voting results for process PI at trustworthy RIGHT nodes agree within A P0 + e. The agreement 
generation property ensures that the voting results for process P2 at trustworthy LEFT nodes agree within 
e. 


Agreement generation is an important property for the ROBUS clock synchronization protocols. It 
implies that the maximum range of the voting results for process P2 at trustworthy LEFT nodes is 
independent of the initial range of values transmitted by process P0. 


87 



A.6.2.2. Agreement propagation phase 

An agreement propagation phase operates under the same assumptions and has the same properties as 
an agreement generation phase. An agreement propagation phase serves to ensure that all of the 
trustworthy nodes agree on the result of the agreement generation phase and to provide a way for good 
recovering nodes to acquire the protocol result, even if their set of eligible voters do not completely agree 
with the set of eligible voters at the trustworthy nodes of the same kind. 

The value transmitted by process P2 for the agreement propagation phase is equal to the result of its 
vote for the agreement generation phase. Let Vp 2 ,mj n and v P 2 , m ax denote the bounds for the values 
transmitted by process P2 at the trustworthy LEFT nodes. The value transmitted by process P3 is equal 
to the result of its vote. Thus, v P3jm j n and Vp 3 , max denote the bounds for the voting results and the 
transmitted values for process P3 at the trustworthy RIGHT nodes. v P4 niin and v P4 max denote the bounds 
for the voting results for process P4 at the trustworthy LEFT nodes. A P2 , A P3 , and A P4 denote the bounds 
on the range of the voting results at the trustworthy nodes for processes P2, P3, and P4, respectively. 

In general, for the trustworthy LEFT nodes, there is little or no difference between taking the voting 
result of P2 or P4 as the protocol result. For the actual ROBUS protocols, symmetry of implementation, 
timing, and others factor are taken into consideration to determine which result to use. 

Good recovering nodes perform the same voting operations as trustworthy nodes of the same kind in 
order to acquire the result of the protocol. The processes for good recovering nodes trying to capture the 
protocol result are labeled P3C and P4C, corresponding to processes P3 and P4 for trustworthy nodes, 
respectively. The most significant difference between good recovering nodes and trustworthy nodes is 
that their set of eligible voters can differ in the number of trustworthy, symmetric, and asymmetric voters. 
For each good recovering node, it is assumed that the set of eligible voters contains more trustworthy 
sources than untrustworthy ones. That is, for good recovering RIGHT node i: 

ITwy_EV P3Ci l > ISym_EVp 3C ,il + IAsym_EV P3Cji l. 

For good recovering LEFT node j : 

ITwy_EV P4C jl > ISym_EV P4C jl + IAsym_EV P4C ,jl. 

It is not assumed that all of the trustworthy sources of the opposite kind are considered eligible voters 
at good recovering nodes. Since the eligible voter sets at trustworthy nodes of a particular kind are 
assumed to include all of the trustworthy sources of the opposite kind, the set of trustworthy eligible 
voters at good recovering nodes must necessarily be a subset of the set of trustworthy eligible voters at 
trustworthy nodes of the same kind. Thus, for good recovering RIGHT node i: 

ITwy_EV P3Ci l < ITwy_EV P3 i l. 

For good recovering LEFT node j : 

ITwy_EV P4C ,jl < ITwy_EV P4 jl. 

No assumption is made about the number of asymmetric untrustworthy sources in the eligible voter 
sets of good recovering nodes. 

The following presents properties of operations with the exact and inexact communication models for 
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trustworthy nodes and good recovering nodes. 


A.6.2.2.1. Voting with exact communication 

The following properties hold for an agreement propagation phase with exact communication. 

Validity at the trustworthy RIGHT nodes: At the trustworthy RIGHT nodes, the result of the vote 
in process P3 is in the interval [v P o jm i n , Vp 0 . max l. 

Proof: Based on the validity property for a stage operation with exact communication, the voting 
results for process P3 at trustworthy RIGHT nodes are in the interval [v P 2 ,mi n , Vp 2 >ma J. According to the 
validity property for the agreement generation phase, this interval is equal to [v P o. mm , v P o iinax ]. 

Validity at the good recovering RIGHT nodes: At the good recovering RIGHT nodes, the result of 
the vote in process P3C is in the interval [v P0 , m i n , v P0 , max ]. 

Proof: The proof for process P3 at the trustworthy RIGHT nodes applies here. 

Agreement propagation at the trustworthy RIGHT nodes: The result of the vote for process P3 at 
the trustworthy RIGHT nodes is equal to the result of the vote for process P2 at the trustworthy LEFT 
nodes. 

Proof: The agreement generation phase ensures exact agreement for process P2 at the trustworthy 
LEFT nodes (i.e., v P 2 ,min = v P 2 , max )- The validity property and the agreement propagation property for a 
stage operation ensures that the result for process P3 at the trustworthy RIGHT nodes exactly matches the 
results for process P2 at the trustworthy LEFT nodes. Thus: v P3jmi n = v P 3, max = v P 2 ,min = v P 2 , max - 

Agreement propagation at the good recovering RIGHT nodes: The result of the vote for process 
P3C at the good recovering RIGHT nodes is the same as the result of the vote for process P2 at the 
trustworthy LEFT nodes. 

Proof: The proof for process P3 at the trustworthy RIGHT nodes applies here. 

Validity at the trustworthy LEFT nodes: At the trustworthy LEFT nodes, the result of the vote in 
process P4 is in the interval | v FOmm , v POiiriax ] . 

Proof: Based on the validity property for a stage operation with exact communication, the voting 
results for process P4 at trustworthy LEFT nodes are in the interval [v P3 min , v P3 inax ]. According to the 
validity property for the trustworthy RIGHT nodes in an agreement propagation phase, this interval is 
equal to [ Vpo , nm , v P o )inax ]. 

Validity at the good recovering LEFT nodes: At the good recovering LEFT nodes, the result of the 
vote in process P4C is in the interval [v P0 , m i n , v P0>max ] . 

Proof: The proof for process P4 at the trustworthy LEFT nodes applies here. 

Agreement propagation at the trustworthy LEFT nodes: The result of the vote for process P4 at 
the trustworthy LEFT nodes is equal to the result of the vote for process P2 at the trustworthy LEFT 
nodes. 
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Proof: The vote results for process P3 at the trustworthy RIGHT nodes are known to exactly agree 
with one another and with the vote results for process P2 at the trustworthy LEFT nodes. The validity 
property and the agreement propagation property for a stage operation ensures that the results for process 
P4 at the trustworthy LEFT nodes exactly match the results for process P3 at the trustworthy RIGHT 
nodes. Thus: V P 4 >m in = V P 4 >max = V P 2,min = V P 2 iInax . 

Agreement propagation at the good recovering LEFT nodes: The result of the vote for process 
P4C at the good recovering LEFT nodes is equal to the result of the vote for process P2 at the trustworthy 
LEFT nodes. 

Proof: The proof for process P4 at the trustworthy LEFT nodes applies here. 


A.6.2.2.2. Voting with inexact communication 

The following properties hold for an agreement propagation phase with inexact communication. 

Validity at the trustworthy RIGHT nodes: At the trustworthy RIGHT nodes, the result of the vote 
in process P3 is in the interval [v POjm i n - 3fii, v P0 , max - 3£h] . 

Proof: The application of the validity property for stages 1 through 3 constrains the result of voting 
operations in process P3 at the trustworthy RIGHT nodes to the interval [v P0 , ni i n - 3 £i, v P0 , max - 3e h ] . 

Validity at the good recovering RIGHT nodes: At the good recovering RIGHT nodes, the result of 
the vote in process P3C is in the interval [v P0 ,min - 3£i, v P0 , max - 3EJ. 

Proof: The proof for process P3 at the trustworthy RIGHT nodes applies here. 

Agreement at the trustworthy RIGHT nodes: The voting results for process P3 at the trustworthy 
RIGHT nodes agree within 2e (i.e., A P3 = 2e). 

Proof: Two cases must be considered. 

Case 1: IAsym_EV P3 i l = 0: According to the agreement generation property for a stage operation with 
inexact communication, the voting results for process P3 at the trustworthy RIGHT nodes will agree 
within e. 

Case 2: IAsym_EV P2 ,jl = 0: The agreement generation property for a stage operation with inexact 
communication ensures that the voting results for process P2 at trustworthy LEFT nodes agree within e. 
The agreement propagation property for a stage operation ensures that the voting results for process P3 at 
trustworthy RIGHT nodes differ by at most another e. 

Agreement at the good recovering RIGHT nodes: The voting results for process P3C at the good 
recovering RIGHT nodes agree within 3e (i.e., A P3C = 3e). 

Proof: The number of asymmetric untrustworthy eligible voters for process P3C may be nonzero. The 
worst case agreement among the voting results for process P2 is 2e (i.e., A P2 = 2e). The agreement 
propagation property for a stage operation ensures that the voting results for process P3C at good 
recovering RIGHT nodes differ by at most another e. Thus: A P3C = A P2 + e = 3e. 
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Agreement propagation at the trustworthy RIGHT nodes: The voting results for process P3 at the 
trustworthy RIGHT nodes and the voting results for process P2 at the trustworthy LEFT nodes agree 
within A P2 + max(£], £ h ). 

Proof: The voting results for process P2 at the trustworthy LEFT nodes are in the interval [v P 2 , m i n > 
v P 2 ,max] and agree within A P2 . The voting results for process P3 at the trustworthy RIGHT nodes are in the 
interval [ v P 2 . mm - £ ls Vp 2jrnax + Ej. Thus, the maximum difference between voting results at P2 and P3 is 
max((v P2 

,max + Eh) ' V P2 ,min? V P2 ,max ( V P2 ,min - £])) - A P2 + max(£|, £h). 

Agreement propagation at the good recovering RIGHT nodes: The voting results for process P3C 
at the good recovering RIGHT nodes and the voting results for process P2 at the trustworthy LEFT nodes 
agree within A P2 + max(£], £ h ) . 

Proof: The proof for process P3 at the trustworthy RIGHT nodes applies here. 

Agreement between trustworthy RIGHT nodes and good recovering RIGHT nodes: The voting 
results for processes P3 at the trustworthy RIGHT nodes and the voting results for P3C at the good 
recovering RIGHT nodes agree within 3e. 

Proof: The worst case agreement among the voting results for process P2 is 2e (i.e., A P2 = 2e). 
According to the validity property for a stage operation, the voting results for processes P3 at the 
trustworthy RIGHT nodes and P3C at the good recovering RIGHT nodes are in the interval [v P 2 , m i n - £ 1 , 
Vp 2 max ~ £hL which has a range of 3e. The voting results for process P3 at the trustworthy RIGHT nodes 
are in the validity interval and agree with one another within 2e. However, although the voting results for 
process P3C at the good recovering RIGHT nodes are in the validity interval, their agreement bound is 3e, 
which is equal to the range of the validity interval. In the worst case, voting results for processes P3 and 
P3C can be at opposite extremes of the validity interval and differ by at most 3e. 

Validity at the trustworthy LEFT nodes: At the trustworthy LEFT nodes, the result of the vote in 
process P4 is in the interval [vpo.min - 4 £i, Vpo, max - 4£h] . 

Proof: The application of the validity property for stages 1 through 4 constrains the result of voting 
operations in process P4 at the trustworthy LEFT nodes to the interval [v P0 , ni i n - 4e 1? v POjinax - 4£ h ]. 

Validity at the good recovering LEFT nodes: At the good recovering LEFT nodes, the result of the 
vote in process P4C is in the interval [vpo.min - 4 £j, v P0 , max - 4£ h ]. 

Proof: The proof for process P4 at the trustworthy LEFT nodes applies here. 

Agreement at the trustworthy LEFT nodes: The voting results for process P4 at the trustworthy 
LEFT nodes agree within 2e (i.e., A P4 = 2e). 

Proof: Two cases must be considered. 

Case 1: IAsym_EVp 3 .il = 0: According to the agreement generation property for a stage operation with 
inexact communication, the voting results for process P3 at the trustworthy RIGHT nodes will agree 
within e. The agreement propagation property for a stage operation ensures that the voting results for 
process P4 at trustworthy LEFT nodes differ by at most another e. 


91 



Case 2: IAsym_EV P4 jl = 0: The agreement generation property for a stage operation with inexact 
communication ensures that the voting results for process P4 at trustworthy LEFT nodes agree within e. 


Agreement at the good recovering LEFT nodes: The voting results for process P4C at the good 
recovering LEFT nodes agree within 3e (i.e., A P4C = 3e). 

Proof: The number of asymmetric untrustworthy eligible voters for process P4C may be nonzero. The 
worst case agreement among the voting results for process P3 is 2e (i.e., A P3 = 2e). The agreement 
propagation property for a stage operation ensures that the voting results for process P4C at good 
recovering LEFT nodes differ by at most another e. 

Agreement propagation at the trustworthy LEFT nodes: The voting results for process P4 at the 
trustworthy LEFT nodes and the voting results for process P2 at the trustworthy LEFT nodes agree within 
A p2 + 2*max(e 1 , £h). 

Proof: The voting results for process P2 at the trustworthy LEFT nodes are in the interval [vp 2 . mm , 
v P 2 ,max] and agree within A P2 . Application of the validity property for stages 3 and 4 constrains the voting 
results for process P4 at the trustworthy RIGHT nodes to the interval [ v P2 min - 2e h v P2 imx + 2e h ]. Thus, the 
maximum difference between voting results at P2 and P4 is max((v P2jinax + 2e h ) - v P2 . nim , v P2jmax - (v P2>min - 
2 £j)) = A P2 + 2*max(£j, £ h ). 


Agreement propagation at the good recovering LEFT nodes: The voting results for process P4C at 
the good recovering LEFT nodes and the voting results for process P2 at the trustworthy LEFT nodes 
agree within A P2 + 2 ;:: max(£], £ h ) . 

Proof: The proof for process P4 at the trustworthy LEFT nodes applies here. 

Agreement between trustworthy LEFT nodes and good recovering LEFT nodes: The voting 
results for processes P4 at the trustworthy LEFT nodes and the voting results for P4C at the good 
recovering LEFT nodes agree within 3e. 

Proof: The worst case agreement among the voting results for process P3 is 2e (i.e., A P3 = 2e). 
According to the validity property for a stage operation, the voting results for processes P4 at the 
trustworthy LEFT nodes and P4C at the good recovering LEFT nodes are in the interval [v P3 niin - £ P v P3max 
- £ h ], which has a range of 3e. The voting results for process P4 at the trustworthy LEFT nodes are in the 
validity interval and agree with one another within 2e. Although the voting results for process P4C at the 
good recovering LEFT nodes are in the validity interval, their agreement is within 3e, which is equal to 
the range bound of the validity interval. In the worst case, voting results for processes P4 and P4C can be 
at opposite extremes of the validity interval and differ by at most 3e. 


A.7. Stage operations of ROBUS protocols 

The ROBUS protocols process PE messages and state data. The required computation for the actual 
protocols goes beyond the basic middle-value-select voting function. However, the dynamic middle- 
value-select voting function is adaptable enough to cover all the required stage operations that involve 
computation processes at the ROBUS nodes. The following subsections describe how a middle -value- 
select voter can be adapted for the actual ROBUS protocols. The main purpose of this section is to 
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establish a link between the theory presented in this appendix and the actual ROBUS protocols. 


A.7.1. Event voting 

The time synchronization protocols use middle -value-select voting for the processing of timing events. 
The voting function for these protocols is referred to as the Accept function. These protocols use a fixed- 
delay communication model, which corresponds to the inexact communication model presented above in 
terms of the precision of received values. 


A.7.2. Routing 

A ROBUS node performs a routing function by relaying a message received from a particular input 
source. A middle-value-select voter can perform a routing function by including in the set of eligible 
voters only the input source of interest. This function is used with the synchronous communication 
model, which corresponds to the exact communication model in the protocol theory. As implemented in 
the ROBUS protocols, voter eligibility for the routing function also takes into consideration input errors 
and local accusations. The special case of an empty set of eligible voters is handled to ensure protocol 
results consistent with the basic protocol theory presented in the previous section. 


A.7.3. Word voting 

The unit of data for word voting is the ROBUS Message. The word voting function implemented in 
the ROBUS protocols is an exact-match majority word vote. This function is used with the synchronous 
communication model, which corresponds to the exact communication model. The vote result equals the 
majority value if an exact-match majority exists (i.e., a majority of eligible inputs are exactly equal). 
Otherwise, the result is invalid and a signal is asserted indicating that there is not a majority value among 
the eligible inputs. 

If there is exact agreement among a majority of the values received from eligible voters, then the result 
of a word vote equals the result of a middle -value-select vote with the same set of eligible voters. The 
ROBUS protocols with word voting handle a no-majority condition as an exception, and the 
corresponding result and interpretation depends on the protocol and the protocol stage being executed. 


A.7.4. Bit voting 

The data for bit voting are the bits from the Payload field of ROBUS Messages. Bit voting is used to 
process diagnostic data. This function is used with the synchronous communication model, which 
corresponds to the exact communication model. The bits are interpreted as Boolean variables with TRUE 
or FALSE values. Bit voting is an exact-match majority bit vote in which the result of a vote equals the 
value of the majority if an exact-match majority exists. Otherwise, the result is equal to TRUE. 

As for word voting, if there is exact agreement among a majority of the values received from eligible 
voters, then the result of a bit vote equals the result of a middle -value-select vote with the same set of 
eligible voters. The no-majority condition is an exceptional case compatible in terms of validity and 
agreement with the protocol theory presented previously. 
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A.8. ROBUS fault assumptions 


The ROBUS fault assumptions are derived from the generic analysis presented in this appendix and 
the specific protocol analyses presented in other appendices of this document. The assumptions are 
sufficient conditions for ensuring that the protocol results are correct. The assumptions depend on the 
mode of operation. 


A.8.1.1. Clique Initialization and Clique Preservation modes 

The following conditions are assessed for each voting protocol executed in these modes. A violation 
of these conditions may result in a protocol failure. Failure is assessed from the perspective of a clique 
rather than individual nodes. 

• For each receiving process at each trustworthy node, all trustworthy sources of the opposite kind 
are eligible to vote. 

• For each receiving process, the trustworthy receiving nodes of a given kind agree on the eligibility 
of non-asymmetric sources of the opposite kind. 

• There are no asymmetric eligible voters for any of the receiving processes at every trustworthy 
BIU receiver or at every trustworthy RMU receiver. 

• For each receiving process at each trustworthy node, the set of eligible voters contains more 
trustworthy sources than untrustworthy ones. (For the Schedule Update and the PE Broadcast 
protocols, the number of eligible voters for process PI may be zero without compromising the 
protocol properties. This is examined in Appendices D and E, respectively.) 


A.8.1.2. Clique Join mode 

The following conditions apply to good recovering nodes in the Clique Join mode. The conditions are 
assessed for each voting protocol executed in this mode. A violation of these conditions may result in a 
protocol failure for the recovering node. 

• For each receiving process, all trustworthy sources of the opposite kind are eligible to vote. 

• For each receiving process, the recovering node agrees with the trustworthy nodes of the same 
kind on the eligibility of non-asymmetric sources of the opposite kind. 

• There are no asymmetric eligible voters for any of the receiving processes at the good recovering 
node and every trustworthy receiver of the same kind, or at every trustworthy receiver of the 
opposite kind. 

• For each receiving process, the set of eligible voters contains more trustworthy sources than 
untrustworthy ones. (For the Schedule Update and the PE Broadcast protocols, the number of 
eligible voters for process PI may be zero without compromising the protocol properties. This is 
examined in Appendices D and E, respectively.) 
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A.8.1.3. Clique Detection mode 

The following conditions apply to good recovering nodes in the Clique Detection mode. The 
conditions are assessed for each voting protocol executed in this mode. A violation of these conditions 
may result in a protocol failure for the recovering node. 

• For each receiving process, the set of eligible voters contains more trustworthy sources than 
untrustworthy ones. 
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Appendix B. Point-to-point communication 


This appendix examines the point-to-point communication between ROBUS nodes. Each 
ROBUS node is driven by an independent physical oscillator and a logical time clock, referred to 
as a local-time clock, that keeps track of the passage of time as indicated by the oscillator. The 
Communication Module of each ROBUS node is composed of transmit and receive sub-modules. 
The transmit sub-module consists of one or more separate transmitters to support broadcast 
transmissions. The receive sub-module consists of a separate receiver for each node of the 
opposite kind. The transmitters and receivers are expected to be generic components supporting 
event-triggered communication. The granularity of a Communication Module transaction should 
be a ROBUS Message, since the communication, processing, and diagnosis performed by the 
ROBUS protocols are based on single-message transactions. For the transmitters, the reading of a 
new message and the beginning of its transmission process is triggered by a send signal at the 
transmitter’s input interface. Similarly, the receivers should be able to receive new messages 
whenever they arrive. The only expected communication throughput constraint at the input 
interface of the transmitters is the minimum data introduction interval (DII), which is the 
minimum number of clock ticks between consecutive requests to send messages. 

The communication system must be able to support the fixed-delay and synchronous 
communication models. For some receiver designs, the output signals from the receiver are not 
synchronized to the circuitry-driving signal generated by a local physical oscillator. Therefore, 
the Computation Module must synchronize each received message with respect to the local 
oscillator before proceeding with further processing. For the synchronous communication model, 
the processing of received messages is triggered by the local-time clock. Therefore, a node must 
be able to buffer received messages until it is time to process them. The timing design of the 
system must be able to handle the uncertainty in the time of transmission, the transmission delay, 
and the synchronization delay. 

In addition, this version of the ROBUS is intended to demonstrate that the bus can achieve a 
PE-message throughput that approaches the available bandwidth at the physical links. For most 
transmissions, it is possible to compute a local-time interval during which a receiver should 
expect to receive the message. For low link data rates, the reception intervals for individual 
nodes do not overlap and each message can be processed before the next one arrives. For high 
data rates, the reception intervals of consecutive messages overlap and the processing must be 
pipelined in order to match the link throughput. This appendix examines some critical aspects of 
pipelined communication. 

In what follows, the term oscillator clock denotes the signal generated by the physical 
oscillator, the local-time clock refers to the logical-time clock, and the local time refers to the 
state of the logical-time clock. The process of synchronizing a signal or a message to the 
transitions of the oscillator clock is referred to as signal synchronization. The process of 
synchronizing a message to the local time is referred to as deskewing. 


B.l. Physical oscillators and local-time clocks 

Each ROBUS node is driven by an independent, free-running physical oscillator (i.e., the 
phase is not controlled in any way) and a logical-time clock (i.e., a counter) that keeps track of 
the passage of time as indicated by the oscillator. An oscillator tick, also called a clock tick or a 
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system tick, is the basic unit of time on the bus. Let f 0 denote the nominal frequency of an 
oscillator measured in ticks per second or Hertz (Hz). The duration of a tick for an ideal 
oscillator is exactly l/f 0 seconds. An ideal oscillator is said to have zero drift rate with respect to 
real-time since the oscillator perfectly marks the passage of time with a tick duration of exactly 
l/f 0 seconds. Real oscillators are characterized by non-zero drift rates with respect to real-time. 
It is assumed that the drift rate of the physical oscillators is bounded by a small positive constant 
po, which is positive, real valued, and unitless. 

The bound on the drift of the physical oscillators is interpreted as follows. Let c x (T) denote 
the earliest real time at which local-time clock x reaches value T. c x (T) has units of nominal 
ticks (1 nominal tick = l/f 0 seconds). T | and T 2 denote arbitrary values of the local-time clock 
with the constraint T 2 > Tj. Then: 


(T 2 - T0/(1 + po) < c x (T 2 ) - c x (T j) < (1 + Po )(T 2 - TO (B.l) 

Let T 0 denote the nominal tick duration measured in seconds (i.e., 1 nominal tick = To seconds 
= l/f 0 seconds). T x denotes the actual tick duration of local-time clock x. The bound on the drift 
rate of clock x can be expressed as follows: 

V(1 + Po) - 'tx - (1 + po)T 0 (B-2) 

In other words, the fastest clock has a tick duration of at least 1/(1 + p 0 ) nominal ticks, and the 
slowest clock has a tick duration of at most (1 + p 0 ) nominal ticks. This simple model accounts 
for the drift with respect to real time of the physical oscillators and the local-time clocks. The 
point-to-point communication model accounts for jitter on the output of the physical oscillator. 


B.2. Synchronization of asynchronous signals 

Single -phase edge-triggered flip-flops used as building blocks in traditional synchronous 
sequential digital circuits have a simple nominal timing behavior: If the signal at the data input is 
stable within a specified window around the oscillator clock’s triggering edge, then the input 
value will propagate to the output of the flip-flop and stabilize within some guaranteed time. The 
propagation delay of the flip-flops is the time elapsed from the triggering edge of the oscillator 
clock until the output is stable. The window around the oscillator clock’s triggering edge is 
characterized by the setup and hold time of the flip-flop. The setup time is the mi nimum time 
that the input signal must remain stable before the triggering edge of the oscillator clock in order 
for the output of the flip-flop to meet the nominal propagation delay. The hold time is the 
minimum time that the input signal must remain stable after the triggering edge of the oscillator 
clock in order for the output of the flip-flop to meet the nominal propagation delay. 

The domain of an oscillator clock includes all the digital circuitry driven by that signal. A 
signal is said to be synchronous with respect to a particular oscillator clock if the timing of the 
signal meets the input setup and hold time constraints of the flip-flops driven the oscillator clock. 
A signal that does not meet these constraints is called asynchronous with respect to the given 
oscillator clock. Since the oscillator clocks in the fault containment regions of the ROBUS are 
independent and the timing of their transitions is not coordinated in any way, any signal crossing 
from one FCR to another is considered asynchronous when it arrives at the receiving FCR. 
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Asynchronous signals must be synchronized to the oscillator clock before they can be 
processed. Various mechanisms can be used to achieve this synchronization. Ultimately, 
however, consideration must be given to the problem of violations of the setup and hold times of 
flip-flops reading the signal. A flip-flop sampling an input that is not stable within the setup and 
hold window can enter a metastable condition in which the output does not settle to a valid logic 
state within the nominal propagation delay. If not handled properly, this can result in the 
generation of more asynchronous signals and the propagation of errors throughout the receiving 
FCR. 

The mean time between failure (MTBF) for a flip-flop reading an asynchronous input is (see 
[XAPP077]): 

MTBF = rT C2 * tMET / (2*C!*f D *fc) (B.3) 


where t M ET denotes the time available for the metastability to resolve itself (i.e., time allowed by 
downstream circuitry before reading the output of the flip-flop), f D denotes the input signal 
frequency (2*f D is the input signal event rate), f c denotes the oscillator clock frequency, Q 
denotes the metastability aperture of the flip-flop (related to the width of the window during 
which an input can cause a metastability condition), and C2 denotes the resolution rate (related to 
the speed with which the metastable condition will be resolved). Constants Cl and C2 are 
functions of the process technology and flip-flop design. For current technology, the variables of 
the MTBF can be selected such that the probability of metastability failures is extremely small. 

In what follows, it is assumed that the problem of metastability is properly handled by the 
implementation of the ROBUS. For analysis, unless explicitly stated otherwise, it is assumed that 
the nodes have ideal signal synchronizers, each consisting of a single flip-flop driven by the 
oscillator clock. These ideal flip-flops have no metastable states and zero propagation delay. The 
timing behavior is as follows. 

If the input changes just before the triggering-edge of the oscillator clock, this latest input 
value will propagate to the output as soon as the triggering-edge of the oscillator clock 
arrives. If the input changes at exactly the same time as the triggering-edge of the oscillator 
clock, the input value will not affect the output until the next triggering-edge of the oscillator 
clock (assuming that the input remains constant). 


B.3. Single-message communication 

The communication of a message from a source node to a receiver node is modeled as a four 
step process: (1) Send: The Computation Module of the source node signals the transmitter(s) in 
the Communication Module that a message is ready for transmission; (2) Transmission: The 
transmitter reads the message and transmits the corresponding signals over the transmission 
medium; (3) Delivery: The link receiver gets the message from the transmission medium and 
signals the arrival to the signal synchronizer; (4) Reception: The synchronizer signals the arrival 
of a new message to the Computation Module. Figure B.l illustrates the point-to-point 
communication path. CLK Rx denotes the oscillator clock at the receiving node. The message 
delivery delay is the time elapsed from the instant a transmitter receives a send request until the 
message is presented at the output interface of the receiver. The message reception delay is 
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equal to the message delivery delay plus the additional time delay to synchronize the received 
message to the oscillator clock at the receiving node. 


Send 


Transmission 


Synchronizer 


Delivery Reception 

I 1 f 



CLK 


Rx 


Figure B.l: Point-to-point communication path 

Let dppj and d PPh denote the minimum and maximum point-to-point message delivery delays, 
respectively, measured in units of nominal clock ticks. v PP denotes the delivery precision (i.e., the 
uncertainty in the point-to-point delivery delay) measured in units of nominal clock ticks. r PP1 
and i'pp h denote the minimum and maximum point-to-point message reception delays, 
respectively, measured in nominal clock ticks. e PP denotes the reception precision (i.e., the 
uncertainty in the point-to-point reception delay) measured in nominal clock ticks. 


B.3.1. Reception delay 

Let T 0 denote the local time at which the source sends the message, and let t 0 denote the 
corresponding real time. The real-time range of point-to-point message delivery is [t 0 + d PP1 , t 0 + 
d PPJl ], Therefore, the delivery precision is: 

v PP = d PP ,h - d PP ,i (B.4) 

The minimum point-to-point message reception delay happens when the message is sampled 
by the input synchronizer at exactly the same time it is delivered. 

r pp,i — d PP ,i (B.5) 

The maximum point-to-point message reception delay happens when the message is sampled 
by the input synchronizer exactly one tick after it is delivered. The worst case delay occurs when 
the oscillator clock at the receiving node is slow. 

fpp.h — d PP ,h + ( 1 + po) (B.6) 

The real-time range of reception is [t 0 + r PPj i, t 0 + r PP>h ]. Therefore, the reception precision is: 

e PP = r PPh - r PP2 = [d PPh + (1 + po)] - d PP2 = 1 + p 0 + v PP (B.7) 

e PP accounts for time -discretization errors, jitter and drift of the source and recover oscillators, 
and slight differences in point-to-point communication delay due to different length wires/fibers. 

Next, we define IMPix h x 2 ), the Integer Mid-Point value, as the integer closest to the mid- 
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point of x, and x 2 . IMP(x 1 , x 2 ) is computed in two steps: 

Step 1: x = (xl + x 2 )/2 

Step 2: IMP = round(x), with round(x) = LxJ if x < Vi, or Tx] if x > Vi 
R PP denotes the expected reception delay: 


R PP - IMPO'ppi, t'pp pj 


(B.8) 


B.3.2. Estimate of the local-time at the source 

Let T rcv denote the local time at the receiver when it receives the message. To estimate the 
local time at the source node, the receiver assumes that the message reception delay is R PP ticks of 
its oscillator clock. The estimated local time at the source node at the time of reception is: 

Tsrc.e = To + Rpp (B.9) 

The error in the local-time estimate is bounded as follows. T RCV occurs no earlier than p PP1 
nominal ticks from the actual local time T SRC E at the source: 

M-pp.i = (1 + po)Rpp - rpp,i (B. 10) 

T RC v occurs no later than p PPj h nominal ticks from the actual local time T SR c.e at the source: 

Ppp,h = fpp.h - Rpp/(1 + Po) (B. 1 1) 


B.3.3. Expected local time of reception 

Let 7t PPjSR denote a bound on the relative local-time skew between the source and the receiver 
nodes. This bound is assumed to hold for the duration of the communication. The expected local 
time of reception at the receiver is denoted by T RC v,e- 

Trcv.e = To + Rpp (B. 12) 

Due to the relative local-time skew and the uncertainty in the message -reception delay, the 
message will arrive within some local-time interval containing T RC v,e- Let Ap PRCV denote the 
local-time error in T RCV : 

App, R cv — Trcv - Trcv.e (B.13) 

We want to determine the absolute maximum local-time error in Trcv, denoted by App.RcvUs-max: 

ITrCV — Trcv.e I — App,RCvlabs-max (B.14) 

The value of App^cvUs-max is derived as follows. The bound on the local-time synchronization 
between the source and the receiver nodes is expressed as: 
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I c src(T) - c rcv (T)I < 7lpp s R 


(B.15) 


where Csrc(T) and c RC v(T) denote the earliest real times at which the local times at the source and 
at the receiver, respectively, reach value T. From the previous analysis, it is known that the real- 
time difference between the time when the source reaches T rcv ,e and the time when the message 
is actually received T RC v is bounded above and below by p PPih and p PPJ , respectively. 

Csrc(Trcv.e) - ftpp,l — c RCv(TrCv) - Csrc(Trcv.e) + M-PP.h (B. 16) 

For local time T RC v.e, inequality (B.15) can be re-expressed as: 

Csrc(TrcV.e) - ttpP.SR ^ Crcv(Trcv.e) ^ Csrc(Trcv.e) + ttpP.SR (B. 17) 

Combining inequalities (B.16) and (B.17), we get: 

-ttpP.SR - ftpp.l ^ Crcv(Trcv) - Crcv(Trcv.e) ^ ftpP.SR + ftpp.h (B. 18) 

So: 

Icrcv(Trcv) - Crcv(Trcv.e)! - max(7tppsR + ftpp.i , ttpp.sR + Ppp,h) (B. 19) 

Equivalently: 

Icrcv(Trcv) - Crcv(Trcv.e)! ^ ttpp.sR + max(p P p,i , ftpp,h) (B.20) 

Using the constraint that the local clocks are p-bounded, the definition of A pp ,rcv, and the real 
time duration of A PP RCV ticks for the fastest allowed clock: 

IApp,rcvI/(1 + po) ^ Icrcv(Trcv) - Crcv(Trcv.e)! (B.21) 

Combining (B.20) and (B.21): 

I^PP.RCV I — ( 1 + Po)(ftpP,SR + max (ppp,l J Ppp.h)) (B.22) 

Since A pp ,rcv is an integer, we can take the floor in (B.22): 

lApP.RCV I < L(1 + po)(ttpp,SR + max (ftpp,l • Ppp,h))j (B.23) 

Therefore, the worst-case local-time difference between the actual time of reception T RC v and the 
expected time of reception T RC v.e is: 

ApP,RCvlabs-max — L(1 + po)(ttpp,sR + max(p P p,i , ppp,h))J (B.24) 

B.4. Coordination for synchronous communication 

For the synchronous ROBUS protocols, the scheduling of operations is based on a distributed 
synchronous composition abstract model of the system in which a single oscillator drives a 
common local-time clock and fixed-delay processes corresponding to the communication and 


102 



computation operations of the BIUs and RMUs. Communication during time -driven operations is 
time -triggered. For each transmission, the sources and receivers use a particular local-time value 
as a distributed reference event to coordinate their actions. Given specific bounds for the 
reception delay and the relative local-time skew between sources and receivers, it is possible to 
coordinate the send and receive operations such that the transmitted messages are received within 
a predetermined local-time range measured at the receivers. The receivers can then apply a 
deskewing function and forward the received messages for processing at a predetermined local 
time. By leveraging the previous analysis, it is possible to analyze the source-receiver 
coordination problem using only global time (i.e., synchronized local time viewed from a global 
perspective). Figure B.2 illustrates the relevant timing events. T RE f denotes the reference local- 
time value. T snd is the time at which the message is sent. R PP is the expected reception delay. 
Trcv,e is the expected time of reception. W Deskew is the size of the deskewing window. W Deskew pre 
is the pre -expectation window (i.e., the size of the section of the deskewing window before the 
expected time of reception). W Deske w,post is the post-expectation window (i.e., the size of the 
deskewing window after the expected time of reception). T PRO c,be g in denotes the time for the 
beginning of message processing. 



TrcV.E " Woeskew.pre Tr('V,[' + Wp) eskew p 0st 


Figure B.2: Timing events for point-to-point communication 

A message from a good source is expected to arrive during the following closed time interval, 
which includes all triggering edges of the local clock within the expected time range of reception: 

[TrCV.E - A PPR cvlabs-max> T R CV,E + A PP R cvlabs-max] (B.25) 

The deskewing window includes all triggering edges of the clock within the expected time 
range of reception, a total of 2A PPRC vlabs-max + 1 edges. The deskewing window is intended to 
cover the duration of all local clock counts corresponding to the triggering edges of the clock 
within the expected time range of reception. The local clock counts corresponding to these 
triggering edges determine a time interval with a duration of 2A PP RCV l a bs-max + 1 ticks. The 
deskewing window extends for the real-time interval corresponding to the following half-closed 
local-time interval: 

[TrCV.E - A PPjR cvlabs-max, T R CV,E + A PP , R cvlabs-max + 1) (B.26) 

So: 

^^Deskew — 2App Rcyl a b s - max + 1 (B.27) 

Then: 

^^Deskew,pre — AppRcylabs-max (B.28) 
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W^eskew.post ^PP.RC vLbs-rnax "t" 1 

For proper communication, the following constraints must be satisfied: 


(B.29) 


ref ^ Tsnd 

(B.30) 

REF — T R cv.E ‘ W De skew,pre 

(B.31) 

rcv.e = Tsnd + Rpp 

(B.32) 

RCV,E Deskew, post — T PROC, begin 

(B.33) 


Relations (B.30) and (B.31) express basic time constraints given by the use of a common 
reference time. Relation (B.32) captures the goal of source-receiver coordination, which is to 
receive the message at the expected time of reception. Relation (B.33) is only relevant to the 
composition of operations at the receiving node. Let A ref _ S nd denote the delay from T REF to T SND 
measured in local clock ticks. 

A ree _snd — Tsnd ■ T ree (B.34) 

Let A ref _ RCV wnd denote the delay from T REF to T RCV , E - W DeskeWiPre measured in local clock ticks. 

A REF - R CVWND — (T R CV,E - Wneskew ,pre ) - T ref (B.35) 

Let A REF . SND l min and A REF . RCVW NDlmin denote the minimum values for A REF . SND and A ref . RCVW nd, 
respectively. Relation (B.32) can be re -expressed as follows: 


REF-SND + Rpp — A REF . R CVWND + VV Deskew.pre 


+ Wr 


(B.36) 


We are interested in finding the values for A REF . SND and A ref . RCV wnd to achieve the earliest 
communication satisfying (B.30), (B.31), and (B.32). We consider two cases. 

Case 1: A REF _SNDlmin + Rpp — A REF _ R cvWNDlmin + W De skew,pre 

For this case, the message can be sent as soon as possible, but the window must be delayed to 
align it with the expected time of reception. 


A REF -SND = A REF _SNDlmin 

A REF - R CVWND — A REF _SNDlmin + Rpp “ Woeskew.pre 
Case 2: A REF _SNDlmin + Rpp < A REF _ R cvWNDlmin + Woeskew.pre 


(B.37) 

(B.38) 


For this case, the window can be opened as soon as possible, but the message must be delayed 
to achieve proper alignment. 


i — Ar 


, + W r 


R P 


^VREF-SND — ^iREF-RCVWNDlmin T *V Deskew.pre " Wpp 
Ar EF -RCVWND — A REF . R cvWNDlmin 


(B.39) 

(B.40) 
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B.5. Message streams 

Each message in a message stream is processed independently. Let K denote the total number 
of messages in the stream, i denotes the index for the messages in the stream, with 0 < i < K- 1 . 
TsND.i is the local time at which the source sends the i-th message of the stream. T RC v.E.i is the 
expected local time of reception for the i-th message of the stream. A stream denotes the data 
introduction interval at the source measured in local clock ticks. 

The throughput capacities of the Communication Module and the Computation Module are 
characterized by their respective minimum data introduction interval [De Micheli 94] . Let A Co mm 
and Acomp denote the minimum data introduction interval for the Communication Module and the 
Computation Module, respectively. A Comm and A Co mp are measured in local-clock ticks. Lor 
proper processing, A stream must be larger than A Comm and A Co mp- 

A s tream — ITiax(Acomm» Acomp) ( B . 4 1 ) 


B.5.1. Message delivery rate 

We would like to compute the number of messages that can be delivered during a particular 
time interval. We consider intervals during steady state transmission after the leading edge of the 
stream and before the trailing edge. Because of the drift rate of the clocks, the observed number 
of delivered messages can vary within a range. 

Let W RCV denote the size of the observation window at the receiving node measured in local 
clock ticks. Q denotes the number of messages delivered during the observation window. X SR c 
denotes the data introduction interval measured in nominal clock ticks. w RC v denotes the size of 
the observation window at the receiving node measured in nominal clock ticks. Let t de i iveri i denote 
the real time at which message i is delivered. 

tdeliver.i — tdeliver.O "t" i 7-SRC (B.42) 

Let t 0 bs,i and t 0 b s ,h denote the beginning and end times, respectively, for the observation 
window. The observer records received messages during the closed interval [t obs ,i , t 0 bs.h] - tobs.i and 
t 0 bs,h are related by the size of the observation window. 


t 0 bs,h - t 0 bs,l+ Wrcv 

(B.43) 

The following constraints 

are applied in order to determine the number of observed messages. 

tdeliver,0 ^ tobs,l 

(B.44) 

tdeliver,l — tobs,l 

(B.45) 

ldeliver,Q — lobs,h 

(B.46) 

tdeliver,Q+l ^ tobs,h 

(B.47) 
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For these constraints, a total of Q messages in the index range 1 to Q are delivered within the 
observation interval. The maximum value of Q is derived as follows. Relation (B.46) can be re- 
expressed as: 

tdeliver.O + Q^-SRC - tobs.l + Wrcv (B.48) 

So: 

Q - [(tobs.l - tdeliver.o) + WrcvJA-SRC (B.49) 

The right-hand side reaches its maximum value when t obs ,i - t de ii ver ,o = 4s kc- In that case, t de u V er,i = 
tobs.l* So. 

Q - I 4src + w rcv]/4src (B.50) 

Since Q is an integer, we can take the floor on the right-hand side of the expression. Then: 

Qlmax = I-WrcvA-SRC J+l (B.51) 

For a fast source clock: 

SRC, fast ~ ^kstrean/(l “t po) (B.52) 

For a slow receiver clock: 

Wrcv.sIow — (1 + Po)W RCV (B.53) 

Therefore, for the maximum value of Q: 

Qlmax = L(W R cv/A stre am)(l + P()) 2 J + 1 (B.54) 

The minimum value of Q is derived as follows. Relation (B.47) can be re -expressed as: 

tdeliver.o + (Q + 1)^-SRC > t 0 bs,l + Wrcv (B.55) 

So: 

Q > [(tobs.l - tdeliver.o) + w RCv]/^-SRC “ 1 (B.56) 

The right-hand side approaches its minimum value as t 0 b s ,i - tdeliver.o approaches 0. So: 

Q > Wrcv/^src - 1 (B.57) 

Q is an integer strictly larger than w RC vA*src - 1 ■ The smallest integer that satisfies this relation is 
given by: 

Qlmin — Lwrcv/^SRC J (B.58) 

For a slow source clock: 
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^SRC,slow — (1 + po)A stream 


(B.59) 


For a fast receiver clock: 

w RCV,fast = Wrcv/(1 + Po) (B.60) 

Therefore, for the minimum value of Q: 

Qlmin — L(W RCV /A stream )/(1 + p 0 ) 2 J (B.61) 

B.5.2. Expected local time of reception 

The transmission times for the messages are related by the data introduction interval: 

TsND.i = TsnD.O + iAstream (B.62) 

At the receiver, the relation among the messages is similar. 

T RCV.E.i = Trcv.E.O + iA s tream (B.63) 

Using the analysis for single -message communication: 

T RCV,E,i = TsND.i + Rpp (B.64) 

Let T RCV ,i denote the actual time of reception for the i-th message. From the analysis of single- 
message communication, T RCV ,i and T RCVi E,i are related as follows: 

ITRCV.i - T R cv,E,i I ^ A PPj RCvlabs-max (B.65) 

Re-expressing (B.65): 

TRCV.E.i - App Rcvlabs-max — T R cv,i ^ T R cv.E,i + App,RCvlabs-max (B.66) 

The stream as a whole should be received within the following local time interval: 

[TrCV.E.O ■ AppRCvlabs-max , Trcv,E,K-1 + App,RCvlabs-max] (B.67) 

Re-expressing (B.67): 

[Trcv.E.O ■ App RCvlabs-max , Trcv.E.O + (K-l )A str e a m + App,RCvlabs-max] (B.68) 


B.5.3. Message reception rate 

The Astream communication parameter gives the nominal message reception rate for the stream 
in units of ticks per message. An important consideration for the processing of message streams 
is the relation between A stream and App^cvLs-max- As presented above, App^cvLs-max measures the 
uncertainty in the time of reception of each message. In particular, the total uncertainty in the 
time of reception for a particular message is 2A PP RCV labs-max local clock ticks centered around the 
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expected time of reception. A message from a good source can be received at any of the 
2Ap P R cvlabs-max + 1 triggering edges of the oscillator clock in the corresponding reception interval. 
Let Z denote the number of messages from a good source that can be received during a 
2 A PPi rc v I abs-max interval. Then: 

Z - l_2App, R cvlabs-max/A s t r eanJ + 1 (B.69) 


B.5.3.1. Non-overlapping reception intervals 

If A stream > 2App.Rcvlabs.max, the expected reception intervals for consecutive messages will not 
overlap or even coincide end-to-end (i.e., no shared triggering edges in consecutive expected 
reception intervals). For this case, Z = 1, which means that the messages of the stream are 
received as separate communications with no interaction. 


B.5.3.2. Overlapping reception intervals 

If A stream < 2 App.pc v l a bs-rnax, the expected reception intervals for consecutive messages will 
overlap or coincide at the ends. For this case, Z > 1, which means that the interaction between the 
messages must be taken into consideration. This is especially important for the diagnosis of 
timing errors. 


B.5.4. Load size for a message reception buffer 

We refer to the number of messages stored in a buffer as the load on the buffer. The function 
of the message receive buffer is to collect the messages received at the Computation Process. For 
single-message communication, it is expected that the processing of each message will begin at or 
before the next message is received. The same can occur for a message stream in which the 
reception intervals for consecutive messages do not overlap. In these cases, the load of the 
receive buffer is less than or equal to 1. From this point on, we only consider cases in which the 
processing of individual messages may begin after the reception of subsequent messages in the 
stream. This includes cases of overlapping and non-overlapping reception intervals. 

Let A PRO c, begin denote the delay in the beginning of processing of a message with respect to the 
corresponding expected time of reception. We assume that the interval between the beginning of 
processing of consecutive messages is the same as the data introduction interval for the message 
stream, A stream . T PRO c,i denotes the local time at the beginning of processing for message i. 

T PR OC,i = T R cv,E,i + A PR 0C, begin (B.70) 

B. 5.4.1. Combined message synchronization and buffering 

Figure B.3 illustrates the interconnection of functions for this case. CLK Rx denotes the 
oscillator clock at the receiving node. STB Rx denotes the strobe signal indicating that a new 
message is ready. The Link Receiver transfers the messages to the Receive Buffer as soon as 
they are ready. The output of the receiver is assumed to be asynchronous with respect to the 
oscillator clock. The Receive Buffer is an asynchronous FIFO, which means that the push (i.e., 
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write) and pop (i.e., remove) action signals may be synchronous with respect to different clock 
signals. In effect, in addition to being a buffer, the asynchronous FIFO serves as a signal 
synchronizer for data crossing from one clock domain to the other. Note that the data is read for 
computation one tick before it is popped from the receive buffer. 


Asynchronous FIFO 


Delivery 




To Computation 
Process 



Link 


Receive 


w 

Receiver 


Buffer 

W 





it it 



STB r 


CLK r 


Figure B.3: Reception using combined message synchronization and buffering 

In order to ensure a read-after-write sequence at the Receive Buffer during normal operation, 
the reading of a particular message by the Computation Process should be triggered after the end 
of the corresponding reception interval. The following relation must hold in order to satisfy this 
property. 

ApROC.begin > App.RCvhbs -max (B.71) 

Again, note that the pop takes place one tick after the start of processing for each message, 
tdeiiver.i denotes the real time at which message i is written to the buffer. Note that a message gets 
pushed at the same time that it is delivered. With Asrc denoting the data introduction interval at 
the source node, tdeiiver.i is given by the following equation. 

tdeiiver.i *“ tdeliver.O i^SRC (B.72) 

Let tp 0 p i denote the real time at which message i is popped from the buffer. The pop times for 
the Computation Process are given by the following relation, with Xrcv equal to the data 
introduction interval at the Computation Process. 

tpop.i tp 0 p o "t iA,Rcv (B.73) 

Let Qdeiiver(t) denote the number of delivered messages by time t. Q pop (t) denotes the number 
of popped messages by time t. QAsync-Buffer(t) denotes the number of messages held by the 
asynchronous Receive Buffer at time t. 

For Qdeliver(t)* 


0, fol t < tdeliver.O 

Qdeliver(l) 1— [(t “ tdelivcr.oV^SIdcl l_l, for tdeliver.O— 1 — tdeliver.O ( R “ 1 ) A. 

K, for t > tdeliver.O + (K-l)^-SRC 


SRC 


(B.74) 
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For Qp 0 p(t): 


0, for t < t pop o 

Qpop(t) = -< L[(t - t p0 p,o)/^Rcv] + 1_L for tpop,0 — t — tpop 0 + (K-I)Xrcv 

K, for t > tpop ,o + (K- 1 jApcv 


(B.75) 


For QAsyn-Buffer(t): 

QAsyn-Buffer(t) Qdeliver(t) Qpop(t) (B.76) 

To determine the maximum load for the receive buffer, we consider the case of a fast source 
clock and a slow receiver clock. Thus: 

^-SRC = ^-SRC.fast = A stream /(1 + po) (B.77) 

^-RCV = ^-RCV.slow = ( 1 + po)A stream (B.78) 

Assume that the first message is delivered at the earliest possible time. That is: 

tdeliver.o = CrCv(TrCV,E,0 " App,RCvlabs-max) (B.79) 

The time of the first pop action is: 


tpop,0 CrCv(TrCV,E,0 + ApROC.be gin +1) 

= tdeliver.o + (1 + Po)(App,RCvlabs-max + ApROC.begin + 1) (B.80) 

Since the source has a faster clock, the number of buffered messages can increase up to the 
instant the last message is delivered (i.e., t = td e ii Ver .K-i = tdeliver.o + (K-I)Xsrc)- Thus, the maximum 
buffer load is given by Q A sync-Buffer evaluated at t de iiver,K-i. 

QAsyn-Buffer(t)l max Q Asyn-Buffer(tdeliver,K- 1 ) 

= K-L[(K-m sRC .fast ' ( 1 + Po)(App,RCvlabs-max + ApROC.begin + l)]/^RCV,slow + lJ 
= K - L(K-1)/(1 + po) 2 - ( App,RC V I abs-max + ApROC.begin + 1)/A st ream + lJ (B.81) 


B.5.4.2. Separate message synchronization and buffering 

Figure B.4 illustrates the interconnection of functions for this case. The receiver is assumed to 
hold the message until it is processed by the synchronizer. For this synchronization mechanism, 
the input rate must be slower than the local clock frequency to ensure at least one triggering edge 
of the oscillator clock per delivered message. Thus, A strea m must be larger than 1. Since A strea m is 
an integer, its value must at least 2 (i.e., A stream > 2). 
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Figure B.4: Reception using separate message synchronization and buffering 


This configuration differs from the one using the asynchronous FIFO in that the 
synchronization is performed by a dedicated synchronizer. At a minimum, this element 
introduces a one-tick delay in the transfer of messages from the Link Receiver to the Receive 
Buffer. The worst-case delay in storing the message in the buffer is two oscillator clock ticks. 
Therefore, compared to the timing of the circuit with the asynchronous FIFO, the writing of 
streamed messages to the synchronous FIFO buffer begins at least 1 local tick later and can end 
up to 2 oscillator clock ticks later. 


To determine the maximum load for the receive buffer, we consider the case of a fast source 
clock and a slow receiver clock. The maximum load is assessed at the earliest time at which the 
last message of the input stream can be written into the buffer. Let t wr i t e,o denote the earliest real 
time at which a received message is written to the buffer. 


twrite,0 — c RCv(TrCV,E, 0 ' ApP,RCvlabs-max + 1) - tdeliver.O + (1 + Po) (B.82) 

The delivery time for the last message is: 

tdelivcr.K I — tdeliver.O ~b (R - 1 )^-SRC,fast (B.83) 


After delivery, the message must be synchronized and written to the buffer. In the fastest case, 
the delivered message is immediately read by the synchronizer and presented to the buffer for 
loading, which will then occur 1 tick later. 

t write, K- 1 — tdeliver.K- 1 + (1 + Po) — tdeliver.O + (K- 1 )XsRC,fast + (1 + Po) (B.84) 

Let Qsync-Buffer(t) denote the number of messages held by the synchronous receive buffer at time t. 
The maximum load is given by: 


Qsync-Buffer(t)l max Qsync-Buffer(twrite,K- 1 ) 


= K 
= K 
= K 


Qpop(twrite,K- 1) 

L [(t write, K-l " tpop,oV^RCV,slow] + lj 

L(K-1)/(1 + Po) 2 - (App ,RCvlabs-max + ^PROC, begin)/ A stream + lJ 


(B.85) 
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Appendix C. Analysis of the clock synchronization protocols 


This appendix examines the timing aspects for the local-time synchronization scheme. The 
diagnostic system works in close coordination with the clock synchronization system to 
determine the status of the bus and to specify the nodes eligible to participate in clock 
synchronization operations. That aspect of the ROBUS is outside the scope of this appendix. 
The analysis presented here uses the fundamental fault-tolerance concepts presented in Appendix 
A and the point-to-point communication concepts presented in Appendix B. 


C.l. Clock synchronization system 

Each ROBUS node is driven by an independent, free -running physical oscillator and a local- 
time clock. The oscillators are characterized by a nominal frequency (denoted by f 0 ) and a 
bounded drift rate with respect to real time (denoted by p 0 ). po is a small, positive, real-valued, 
unitless constant. x 0 denotes the nominal tick duration measured in seconds: 1 nominal tick = x 0 
seconds = l/f 0 seconds. Let x x denote the actual tick duration for oscillator x. The bound on the 
drift rate of x can be expressed as follows: 

Xo/(l + Po) < X x < (1 + po)x 0 (C.l) 

So, an actual oscillator has a tick duration between 1/(1 + p 0 ) and (1 + p 0 ) nominal ticks. 

The local-time clock of a node is essentially a counter driven by the local physical oscillator. 
The local time is equal to the state of the counter. Resetting the counter sets the local time to 0. 

The clock synchronization system enables the nodes to use the local time as a reference for the 
coordination of distributed operations. A basic requirement for proper distributed coordination is 
that the relative clock skews remain within known bounds. The relative skew between two 
clocks is the real time elapsed from the instant one clock makes a particular state transition (i.e., 
the count reaches a particular value) until the other clock makes the same transition. In general, 
the relative skew between two events is equal to the real time elapsed between the occurrence of 
the events. Bounded relative skew is achieved by the generation and preservation of approximate 
real-time agreement on the transitions of the local-time clock. The synchronization protocols 
deliver high-precision distributed events used as references to reset the local-time clocks. The 
state of a local-time clock indicates the time elapsed since the last synchronization-reset event. 
The bound on the relative skew between synchronized clocks is tightest at the time of the 
synchronization reset. After the reset, the local times can drift apart from each other and from 
real time at rates determined by the drift rates of the oscillators. The clocks are synchronized at 
regular time intervals in order to ensure that the relative skews remain within known bounds. 

Figure C.l illustrates the conceptual mode transitions for the clock synchronization system. 
Normally there is a clique executing the Synchronization Preservation (SP) protocol to ensure 
that their relative local-time skews remain within known bounds. Nodes in this mode are said to 
be in a synchronized state. A goal of every node is to reach and remain in this state. In the 
context of the synchronization system, nodes operating in a mode other than Synchronization 
Preservation are referred to as recovering nodes. After a power-on enable or the detection of a 
failure, a node examines the activity on the bus. If a clique is found, the recovering node 
executes the Synchronization Acquisition (SA) mode in order to synchronize its local time to the 
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time of the clique. If a clique is not found, the recovering node transitions to the Initial 
Synchronization (IS) mode. After achieving synchronization, the recovering node executes the 
Synchronization Preservation protocol. 


Power-on 

enable 



Figure C.l: Conceptual mode transitions for the clock synchronization system 

At the time of entry into the Synchronization Acquisition mode, the recovering node is in an 
asynchronous state in which there is no significant relation between its local time and the local 
time of the clique. The recovering node uses an Accept function to capture the synchronization 
events in the agreement propagation phase of the Synchronization Preservation protocol. This 
requires that the Accept function only receive synchronization messages from the same execution 
of the protocol. This is accomplished by enabling the Accept function after a frame 
synchronization step in which the gap between executions of the Synchronization Preservation 
protocol is found. 

In general, a group of nodes enters the Initial Synchronization mode within a time interval of 
known bounded duration. When a recovering node enters this mode, it expects that there is at 
least one node of the opposite kind that also makes the transition within the bounded time 
interval. This interval duration is in effect a bound on the relative local-time skew for the 
initializing nodes. Before the execution of the synchronization protocol, these nodes are said to 
be in an unsynchronized state since the initial skew bound can be relatively large compared to 
the skew after the execution of the protocol. 
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Figure C.2 illustrates how the mode transitions are related in time. A group of nodes enters 
Initial Synchronization with a large bound on the relative skew, denoted by jii S . At the end of the 
protocol execution, the local time is set to 0 with the bound on the relative skew reduced to the 
level required for normal operation, denoted by ji sp . At local time T SP , the Synchronization 
Preservation protocol is executed to ensure that the skew remains within the expected bound. 
This cyclic operation continues until a failure occurs or the system is shut down. A recovering 
node in Synchronization Acquisition trying to synchronize to the clique executes the Frame 
Synchronization (FS) protocol followed by the Synchronization Capture (SC) protocol. The 
duration of the Frame Synchronization protocol execution depends on factors like the total 
number of nodes of the opposite kind, the number of untrustworthy nodes of the opposite kind 
active on the bus, the bound on the relative local-time skew of the nodes, and the position of the 
start of the protocol relative to local time of the clique nodes. Synchronization Capture is enabled 
immediately after the execution of Frame Synchronization is complete. The relative skew 
achieved by Synchronization Acquisition is within the bounds of the skew for normal operation. 
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Figure C.2: Timing of mode transitions for the clock synchronization system 

The Initial Synchronization, Synchronization Preservation, and Synchronization Capture 
protocols are based on the same theory of distributed computation using Accept functions to 
process timing events. Figure C.3 illustrates the message flow graph examined in this appendix. 
This graph includes all the processes and messages required for the three protocols. 


C.2. Timing model 

This section describes how the ROBUS nodes are modeled for the analysis of the 
synchronization protocols. 


C.2.1. Computation Module 

The Computation Module is modeled in terms of two components: Computation Process and 
Send Process. The Computation Process handles the processing of received messages according 
to the requirements of the protocol being executed. The Send Process handles the timing and 
formatting requirements for the output messages. 
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Stage 1 Stage 2 Stage 3 Stage 4 

Figure C.3: Combined message flow graph for the analysis of the synchronization protocols 

The Computation Process has two sub-processes: Receive and Accept. The Receive Process 
provides timing delay and in-line error detection for the received messages. The input-output 
delay of the Received Process depends on the synchronization process being executed. For a 
particular process, every input has the same input-output delay. This behavior preserves the 
relative positions of the received synchronization messages, which allows the Accept function to 
properly read the relative skews of the received timing events. The delay of the Receive Process 
depends mainly on the uncertainty in the time of reception and on the time required to process the 
messages for error detection and diagnosis. The Accept Process performs the voting of the set of 
received messages to produce a single result event. The delay of the Accept Process depends on 
the protocol being executed. The Accept Process produces an output event with a predetermined 
delay with respect to the time at which it receives the input event to be selected. For the events 
not selected, the Accept function appears to have a variable input-output delay. 

For each of the synchronization processes, the Computation Process has a fixed delay from the 
time when the event to be selected is received to the time when the Accept output is asserted. 
The delays of the Receive Process and the Accept Process are combined into a single parameter 
denoted by A, which is measured in units of local clock ticks. 

The Send Process handles the transmission of messages. The process delay depends on the 
protocol and process being executed. B denotes the send delay with respect to the process- 
triggering event, and it is measured in units of local clock ticks. 


C.2.2. Communication Module 

The behavior of the Communication Module is independent of the protocol being executed by 
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the Computation Module. Appendix B presents the timing model for point-to-point 
communication. 


C.3. First stage 

Figure C.4 illustrates the detailed message flow graph for stage 1 in a 3x3 system (i.e., 3 BIUs 
and 3 RMUs). 
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Figure C.4: Detailed message flow graph for stage 1 in a 3x3 system 


C.3.1. Expected time of reception for process PI 

The analysis for point-to-point communication presented in the Appendix B can be leveraged 
for the problem of determining the local time range of reception of INIT messages in process PI. 
This is covered in section A. 10.2. 1 of this appendix for the Initial Synchronization and 
Synchronization Preservation protocols. 


C.3.2. Bound on the observed relative skew of received messages for process PI 

Let IIpi Rcv denote the bound on the relative skew observed in process PI for the received 
messages from process P0 at trustworthy BIUs. II P1 RCV is measured in local clock ticks. n P1 RCV 
is used to check for agreement among the received inputs and also to check agreement with the 
result of the Accept output. 

Let T P0 denotes the local time at which a BIU node sends the INIT message in process P0 (i.e., 
the local time when the source’s Computation Module signals the Communication Module to 
send the INIT message). t PO j and t POh denote the earliest and latest real times, respectively, at 
which the trustworthy BIU nodes send INIT in process P0. Let 7t P0 denote the bound on the 
relative local-time skew for the trustworthy BIUs. 7t P0 is assumed to apply for the duration of the 
protocol execution. 7l P o also bounds the precision with which the trustworthy BIU nodes send the 
INIT messages. 


ftpO - tpO.h - tpO.l 


(C.2) 
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Let t P1 RCV ,i and t P1RCV ,h denote the earliest and latest real times, respectively, at which an INIT 
message from a trustworthy BIU node can be received in process PI at the trustworthy RMUs. 


tpi.RCV.l - tpO.l + fpp.l 

(C.3) 

tpi.RCV.h = tpO.h + Tpp.h 

(C.4) 


Let Tpi jR cv,i and T P i, RC v,h denote the earliest and latest local times, respectively, at which a node in 
process PI can receive messages from process PO at trustworthy BIU nodes. A FLRC v denotes the 
measured skew between the earliest and latest received messages from trustworthy BIU nodes 
(i.e., A P1RCV = T F1RCV .h - T F | RCV j). We need to determine the maximum value of A FLRCV . Using 
(C.3), (C.4), and the local-clock function (introduced in Appendix B): 

c RCv(Tpi,RCV,h) ' c RCv(Tpi,RCV,l) - tpi.RCV.h " tpi.RCV.l (C.5) 

From the constraint that the drift rate of the local clocks be p 0 -bounded and the definition of 

Api jR cw 


Api,RCv/(l + Po) - c RCv(Tpi,RCV,h) - c RCv(Tpi,RCV,l) (C.6) 

Combining (C.5) and (C.6), and using the fact that A P1 RCV is an integer: 

Api.rcv ^ L(1 + po)(tpi .RCV.h - tpi.RCV.l )J (C.7) 

n P i, RCV is given by the maximum value of A P1RCV : 

n P i,RCV — A P 1 , R cvlmax — L(1 + Po)(tpi, RCV.h - tpi,RCV,l)J — L(1 + PoX^PO + e pp)j (C.8) 

C.3.3. Relative skew of the Accept outputs for process PI 

Let A P1 denote the delay (in local-clock ticks) of the Computation Process in process PI 
measured from the local time of reception of the selected message until the Accept output is 
asserted. t P i jA ,i and t P i, A , h denote the earliest and latest real times, respectively, at which an Accept 
output in process PI at the trustworthy RMUs can be asserted. 


tpi.A.l — tpO.l + Tpp.l + A P [/(1 + po) (C.9) 

tpi.A.h = tpO.h + r pp,h + ( 1 + Po)A P i (C. 10) 

Therefore, the Accept functions of the trustworthy RMU nodes assert their outputs during a real- 
time interval with the following duration: 

tpi.A.h - tpi.A.l — [ftpo + r PP,h + (1 + Po)A P i] - [r PP j + A P1 /(1 + po)] 

= 7l P o + e PP + [(1 + po) - 1/(1 + po)]A P1 (C.l 1) 

Let AEV_P1 denote the set of asymmetric BIU eligible voters in process PI at a trustworthy 
RMU node. IAEV_P1I denotes the cardinality of AEV_P1. 7t P1 A denotes the bound on the real- 


118 



time relative skew of the Accept outputs in process PI at the trustworthy RMUs. If IAEV_P1I = 0 
for each trustworthy RMU node, they essentially accept on the same message. 

TCp1,aI|AEV_P 1I = 0 = 6pp + [(1 + po) - 1/(1 + po)]Api (C.12) 

If IAEV_P1I ^ 0 for some trustworthy RMU, all we know with certainty is that the RMU nodes 
accept on a message from a trustworthy BIU node or a message from an untrustworthy BIU node 
flanked by messages from trustworthy BIU nodes. 


ftpi,Al|AEV_Pll*0 ~ ftPO + Spp + [(1 + Po) - 1/(1 + Po)]Api 


(C.13) 


From this point on, unless otherwise stated: 

ftpi.A = ftpi.Almax = ^P1 ,aI|AEV_P1I # 0 


(C.14) 


C.4. Second stage 

Figure C.5 illustrates the detailed message flow graph for stage 1 and 2 in a 3x3 system. 


BIU RMU BIU 


Send INIT 
at T P0 


Processes 



Figure C.5: Detailed message flow graph for stages 1 and 2 in a 3x3 system 


C.4.1. Effective reception delay for process P2 

Let Bpo denote the send delay for process PO. We want to compute the effective reception 
delay for process P2. In general, this delay is measured from the time of some local event to the 
time of reception. We use T P0 , the local time of transmission of the INIT message in process PO, 
as the local reference event to measure the reception delay from process PO to process P2. Note 
that instead of the start of the protocol, we choose the send time for process PO as the reference 
time to measure the reception delay. This approach enables the analysis of the synchronization 
protocols independently of B P0 . B P0 is computed based on a single-stage point-to-point 
synchronous communication model (see section A. 10.2.1 of this appendix). 

We need to determine the earliest and latest real times of reception for process P2. Let B P1 
denote the send delay for process PI. t P2RC v,i and t P2 ,Rcv,h denote the earliest and latest real times, 
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respectively, at which an 1NIT message from a trustworthy RMU node can be received in process 
P2 at the trustworthy BIUs. 


tp 2 ,Rcv,i - tpi,A,i + Bpi/(1 + po) + r PPj i — tpoj + 2r PP ,i + (A P i + B P i)/(l + po) (C. 15) 

tp2,Rcv,h = tpi.A.h + (1 + po)Bpi + r PPj h = tpoj + 7t P o + 2r PP ,h + (1 + po)(A P i + B P i) (C. 16) 

r P o-P2,i denotes the minimum effective message -reception delay for INIT messages in process P2 
and is measured from the latest time at which the trustworthy BIU nodes can send INIT to the 
earliest time at which the BIU nodes can receive INIT messages from the trustworthy RMU 
nodes. 


r P0-P2.i - tp 2 ,Rcv,i - tpo.h — 2r PPj i + (A P1 + B P1 )/(1 + po) - 7lpo (C. 17) 

r P o-P2,h denotes the maximum effective message -reception delay for INIT messages in process P2 
and is measured from the earliest time at which the trustworthy BIU nodes can send INIT to the 
latest time at which the BIU nodes can receive INIT messages from the trustworthy RMU nodes. 

rpcm.h = t P 2 ,Rcv,h - tpoj = Ttpo + 2r PP-h + (1 + po)(A P1 + B P1 ) (C. 18) 

The expected reception delay for process P2 is: 

Rpo-p2 = IMP(r P o- P 2 ,i , r P o_ P 2 ,h) (C. 19) 

The IMP function is defined in Appendix B. 

The total effective uncertainty in the real time of reception of the INIT messages in process P2 is: 
r P o-P2,h - rpo-P2,i = 27I P0 + 2e PP + [(1 + po) - 1/(1 + po)](A P i + B P i) (C.20) 


C.4.2. Expected time of reception for process P2 

The BIU nodes expect to receive INIT messages at local time T p 2 ,rcv,e- 

T p 2 ,rcv,e = T P o + R P 0-P2 (C . 2 1 ) 

The real-time error for T p 2 ,rcv,e is bounded as follows. A BIU node will receive an INIT message 
from a trustworthy RMU node no earlier than p P0 _ P2 ,i nominal ticks from T f2 rcv .r. 

ftpO-P2,l = (1 + Po)RpO-P2 - fpO-P2,l (C.22) 

A BIU node will receive an INIT message from a trustworthy RMU node no later than p P0 _ P2 ,h 
nominal ticks from T p2 RCVi e- 


ftpO-P2,h - r P 0 - P 2 .h - RpoWCl + p 0 ) (C.23) 

Let T p 2 .rcv denote the actual local time at a BIU node when an INIT message from a trustworthy 
RMU node is received. In addition, let A P2 RC v denote the local-time error in T p2 RC v- 
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(C.24) 


A P 2,RCV — Tp2,RCV ■ Tp2,RCV,E 

We want to determine a bound for the local-time error in the actual time of reception in process 
P2, denoted by A P2 ,R C vlmax- 

ITp2,RCV - T P 2,RCV,eI — Ap2,RCvlmax (C.25) 

Ap 2 ,Rcvlmax is derived as follows. We know that the difference between the expected and the 
actual time of reception at a BIU node for 1NIT messages from the trustworthy RMU nodes is 
bounded by p P0 . P 2,i and p P o- P 2.h, such that: 


CrCv(Tp 2 ,RCV,e) - !tpO-P2,l ^ Crcv(Tp 2 .RCv) ^ Crcv(T P 2 ,RCV,e) + !lpO-P2,h 


(C.26) 


So: 


ICrcv(Tp2,RCv) - Crcv(Tp2,RCV,e)I ^ max(ppo-P2,l , Itp0-P2,h) (C.27) 

From the constraint that the local clocks be p-bounded and (C.24): 


IAp2,RCvl/(l + po) - l c RCv(Tp2.RCv) - c RCv(Tp2,RCV,e)I (C.28) 

Combining (C.27) and (C.28): 

IAp 2 ,rcvI ^ (1 + po)max(p P o-P2,i , fl P o-P2,h) (C.29) 

Since A p2 ,rcv is an integer: 

IAp 2 ,rcvI ^ L(1 + po)max(p P o-P2,i . p P o-P2, h )J (C.30) 

Therefore: 

Ap2,RCvlmax = L( 1 + po) m ax((tpo-P 2 ,l . ftpO-P 2 ,h)J (C.31) 


C.4.3. Bound on the observed relative skew of received messages for process P2 

Let n P2 ,Rcv denote the bound on the relative skew observed in process P2 for the received 
messages from process PI at trustworthy RMUs. n P2iR cv is measured in local clock ticks. 
n P2 ,Rcv is used to check for agreement among the received inputs and also to check agreement 
with the result of the Accept output. 

The worst case relative skew for received messages occurs when there are asymmetric eligible 
voters in process PI at the trustworthy RMU nodes. The bound on the relative skew of the 
Accept outputs in process PI is Jt P1>A . The additional uncertainty in the reception delay measured 
from the time of the Accept outputs to the time of reception in process P2 is e PP + [(1+po) - 
l/(l+po)]B P1 . 

n P2 ,RCV = L(1 + Po)(t P 2,RCV,h - t P 2,RCV,l)J 
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— L(1 + Po){%l,A + e PP + [(1 + po) - 1/(1 + Po)]Bpi}J 
n P2 ,Rcv — L(1 + po){7lpo + 2e PP + [(1 + po) - 1/(1 + po)](A P1 + B P1 ) } J 


(C.32) 


C.4.4. Relative skew of the Accept outputs for process P2 

Let AEV_P2 denote the set of asymmetric RMU eligible voters in process P2 at a trustworthy 
BIU node. Let ti p2 ,a denote the bound on the real-time relative skew of the Accept outputs in 
process P2 at trustworthy BIUs. A P2 denotes the delay (in local-clock ticks) of the Computation 
Process in process P2 measured from the local time of reception of the selected message to the 
local time when the Accept output is asserted. If IAEV_P1I = 0 for each trustworthy RMU node, 
the trustworthy BIU nodes may have asymmetric RMU nodes in their sets of eligible voters for 
process P2 (i.e., IAEV_P2I ^ 0 for some trustworthy BIUs). In this case, the trustworthy BIU 
nodes accept within the time range delimited by messages from trustworthy RMU nodes. 

?tp 2 ,Al|AEV_P 2 l *0 = ttpi aI|AEV_P 1 I = 0 + EpP + [(1 + Po) " 1/(1 + Po)](Bpi + A P2 ) 

= 2e PP + [(1 + po) - 1/(1 + po)](A P i + B P i + A p2 ) (C.33) 

If IAEV_P1I 0 for some trustworthy RMU nodes, the trustworthy BIU nodes do not have 
asymmetric RMU nodes in their sets of eligible voters for process P2 (i.e., IAEV_P2I = 0 at each 
trustworthy BIU). In this case, the BIU nodes essentially accept on the same message. 

ttp2,Al|AEV_P2l = 0 — 6PP + [( 1 + P()) " 1/(1 + po)]A P2 (C.34) 

From this point on, unless otherwise stated: 

ttp2,A = tt P 2,Almax = ?t P 2,Al|AEV_P2l # 0 (C.35) 


C.5. Third stage 

Figure C.6 illustrates the detailed message flow graph up to stage 3 for a 3x3 system. 


BIU RMU BIU RMU 



Stage 1 Stage 2 


Stage 3 


Figure C.6: Detailed message flow graph for stages 1 through 3 in a 3x3 system 
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C.5.1. Effective reception delay for process P3 

We need to determine the earliest and latest real times of reception of ECHO messages in 
process P3. Let T P i jA ,i denote the local time at RMU node i when it asserts the output of its 
Accept function in process PI. Let t P1A1 and t P1Ah denote the earliest and latest real times, 
respectively, at which the trustworthy RMUs can assert the Accept outputs in process PI. 

ftpi.A — tpi.A.h ' tpi.A.l (C.36) 

Let B p2 denote the send delay for process P2. t P3 , RC v,i and t P3jRC v,h denote the earliest and latest 
real times, respectively, at which ECHO messages from trustworthy BIU nodes can be received 
by an RMU node in process P3. 

tp3,Rcv,i = tpi,A,i + 2r PP3 + (B P1 + A p2 + B p2 )/(1 + po) (C.37) 

tp3,Rcv,h = tpi.A.h + 2r PP ,h + (1 + po)(B P i + A P2 + B p2 ) (C.38) 

rpi-P3,i denotes the minimum effective message -reception delay for ECHO messages in process P3 
and is measured from the latest time at which trustworthy RMU nodes can assert their 
Accept(lNIT) output to the earliest time at which the RMU nodes receive ECHO messages from 
the trustworthy BIU nodes. 


rpi-p 3 .i - t P3 , R cv,i - tpi.A.h - 2r PP3 + (B P i + A p2 + B p2 )/(1 + po) - 7tpi, A (C.39) 

fpi-P3,h denotes the maximum effective message -reception delay for ECHO messages in process 
P3 and is measured from the earliest time at which the trustworthy RMU nodes can assert its 
Accept(lNIT) output to the latest time at which the RMU nodes can receive ECHO messages 
from the trustworthy BIU nodes. 


fpi-P3.h - tp3,RCV,h - tpi.A.l - ^Pl.A + 2r PP ,h + (1 + po)(B P [ + A p2 + B p2 ) (C.40) 

The expected reception delay for process P3 is: 

Rpi-p 3 = IMP(i'p|.p 3 | , i'pi_p 3 i,) (C.41) 

The effective uncertainty in the real time of reception of the ECHO messages in process P3 is: 
r Pi-P3.h - r Pi-P3,i — 27t P i, A + 2e PP + [(1 + po) - 1/(1 + po)](B P1 + A p2 + B p2 ) (C.42) 


C.5.2. Expected time of reception for process P3 

RMU node i expects to receive ECHO messages at local time T P3 , RC v,E,i- 

Tp3,RCv,E,i = T P1Aji + R P i- P3 (C.43) 

The real-time error for T P3 , RC v,E,i is bounded as follows. RMU node i will receive an ECHO 
message from a trustworthy BIU node no earlier than p P i. P3 j nominal ticks from T P3 , RC v,E,i- 
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Pp 1-P3,1 - (1 + P())Rpi-P3 ‘ r Pi-P3,l 


(C.44) 


RMU node i will receive an ECHO message from a trustworthy BIU node no later than gtpi -P 3 .h 
nominal ticks from T P 3, RC v,E,i- 

ftpi-P3,h = fpi-P3.h - Rpi- P3 /(1 + po) (C.45) 

We want to determine the maximum local-time error for the actual time of reception at the RMU 
nodes, denoted by Ap 3 jRCV l ma x- 

ITp 3 jR CV - T P 3 , R cv,eI ^ Ap3 jR cvlmax (C.46) 

Following the analysis for process P2: 

Ap3, R Cvlmax = L(1 + Po) max (P-Pl-P3,l 5 Ppi-P3,h)J (C.47) 

C.5.3. Bound on the observed relative skew of received messages for process P3 

Let n P3 RC v denote the bound on the relative skew observed in process P3 for the received 
messages from trustworthy sources in process P2. n P 3 , RC v is measured in local clock ticks. 
n P3 , R cv is used to check for agreement among the received inputs and also to check agreement 
with the result of the Accept output. 

The worst case relative skew for received messages occurs when there are asymmetric eligible 
voters in process P2 at the trustworthy BIU nodes. The bound on the relative skew of the Accept 
outputs in process P2 is Jt P2>A . The additional uncertainty in the reception delay measured from 
the time of the Accept outputs in process P2 to the time of reception in process P3 is e PP + [(1+po) 
- l/(l+po)]B P2 . 

LIp3,RCV = L(1 + Po)(tp3,RCV,h - tp3, R cv,l)J 


- L(1 + Po){ftp 2 ,A + 6pp + [(1 + Po) ' 1/(1 + Po)]Bp 2 }J 

= L(1 + po){3e PP + [(1 + po) - 1/(1 + po)](Api + Bpi + A P2 + B P2 )}J (C.48) 


C.5.4. Relative skew of the Accept outputs for process P3 

Let AEV_P3 denote the set of asymmetric BIU eligible voters in process P3 at a trustworthy 
RMU node. Let 7t P 3 iA denote the bound on the real-time relative skew of the Accept outputs in 
process P3 at the trustworthy RMUs. A P3 denotes the delay (in local-clock ticks) of the 
Computation Process in process P3 measured from the local time of reception of the selected 
message to the local time when the Accept output is asserted. If IAEV_P2I = 0 for each 
trustworthy BIU node, the trustworthy RMU nodes may have asymmetric BIU nodes in their sets 
of eligible voters for process P3 (i.e., IAEV_P3I ^ 0 for some trustworthy RMU nodes). In this 
case, the trustworthy RMU nodes accept within the time range delimited by messages from 
trustworthy BIU nodes. 

TCp3,aI|AEV_P3I*0 = 7Ip2,aI|AEV_P2I = 0 + e PP + [( 1 + Po) " 1/(1 + Po)](Bp 2 + A P 3) 
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- 2e PP + [(1 + p 0 ) - 1/(1 + po)](Ap 2 + B p2 + A p3 ) 


(C.49) 


If IAEV_P2I 0 for some trustworthy BIU nodes, the trustworthy RMU nodes do not have 
asymmetric BIU nodes in their sets of eligible voters for process P3 (i.e., IAEV_P3I = 0 for each 
trustworthy RMU node). In this case, the RMU nodes essentially accept on the same message. 

%3,aI|AEV_P3I = 0 — e PP + [(1 + po) - 1/(1 + Po)]Ap 3 (C.50) 

From this point on, unless otherwise stated: 

ftp3,A = ftp3,Almax = ^P3 .aI|AEV_P3I * 0 (C.5 1) 


C.6. Fourth stage 

Figure C.7 illustrates the detailed message flow graph up to stage 4 for a 3x3 system. 


BIU RMU BIU RMU BIU 



Figure C.7: Detailed message flow graph for stages 1 through 4 in a 3x3 system 


C.6.1. Effective reception delay for process P4 

We need to determine the earliest and latest real time of reception of ECHO messages from 
trustworthy RMU nodes by the BIU nodes in process P4. Let T P2 ,A,j denote the local time at 
which trustworthy BIU node j asserts the output of its Accept function for process P2. t P2jA ,i and 
tp 2 , A ,h denote the earliest and latest real times, respectively, at which the trustworthy BIUs can 
assert their Accept outputs in process P2. 

ftp2,A = tp2,A,h - tp2.A,l (C.52) 

Let B P3 denote the send delay for process P3. t P4RCV ,i denotes the earliest real time at which 
ECHO messages from trustworthy RMU nodes can be received by a BIU node in process P4. 

tp4,Rcv,i = tp2, A ,i + 2r PP ,i + (Bp 2 + Ap 3 + B p3 )/(1 + po) (C.53) 

tp4,Rcv,h denotes the latest real time at which ECHO messages from trustworthy RMU nodes can 


125 


be received by a BIU node in process P4. 

tpr.Rcv.h = tp2,A.h + 2r PP4l + (1 + Po)(B p2 + A P3 + B p3 ) (C.54) 

rp 2 -P 4 ,i denotes the minimum effective message -reception delay for ECHO messages in process P4 
and is measured from the latest time at which the trustworthy BIU nodes can assert their 
Accept(ECHO) outputs to the earliest time at which the BIU nodes can receive ECHO messages 
from the trustworthy RMU nodes. 


rp2-P4,i - t P 4,Rcv,i - t P 2,A,h - 2r PP3 + (B p2 + A p3 + B p3 )/(1 + po) - 7t P2 ,A (C.55) 

rp 2 -P 4 .h denotes the maximum effective message -reception delay for ECHO messages in process 
P4 and is measured from the earliest time at which the trustworthy BIU nodes assert their 
Accept(ECHO) outputs to the latest time at which the BIU nodes can receive ECHO messages 
from the trustworthy RMU nodes. 


fp2-P4.h - t P 4,RCV,h - t P 2 ,A,l - ttp 2 ,A + 2r PP ,h + (1 + po)(B P 2 + A p3 + B p3 ) (C.56) 

The expected reception delay for process P4 is: 

Rp 2 -p 4 — IMP(r P2 _ P43 , r P2 . P 4 ,h) (C.57) 

The effective uncertainty in the real time of reception of the ECHO messages in process P4 is: 
r P2-P4,h - r P2-P4,i — 271 P 2,a + 2e PP + [(1 + p 0 ) - 1/(1 + po)](B P 2 + A P3 + B P3 ) (C.58) 


C.6.2. Expected time of reception for process P4 

BIU node j expect to receive ECHO messages at local time T p 4 , RC v,e, p 

Tp4,RCV,E,j = T P2>A j + Rp2-P4 (C.59) 

The real-time error for T P4 , RC v,E,j is bounded as follows. BIU node j will receive an ECHO 
message from a trustworthy RMU node no earlier than p P2 -p 4 ,i nominal ticks from T P4 >R cv,eT 

PP2-P4.1 = (1 + Po)Rp 2 -P4 - rp 2 -P4,l (C.60) 

BIU node j will receive an ECHO message from a trustworthy RMU node no later than p, P2 - P4 ,h 
nominal ticks from T P4 RC v,k,,: 

ftp 2 -P4,h = r P 2-P4,h - Rp 2 -P4/(1 + Po) (C.61) 

We want to determine the maximum local-time error for the actual time of reception at the BIU 
nodes in process P4, denoted by A P4 RCV l ma x- 

IT P4 ,RCV - T P4 , R cv,eI ^ A P4jR cvlmax (C.62) 

Following the analysis for process P2: 
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Ap4,RCvlmax — L(1 + po)max((J,p2-P4,l , M-P2-P4,h)J 


(C.63) 


C.6.3. Bound on the observed relative skew of received messages for process P4 

Let n P4 , RC v denote the bound on the relative skew observed in process P4 for the received 
messages from process P3 at trustworthy RMUs. Id P4 RCV is measured in local clock ticks. 
n P4 RC v is used to check for agreement among the received inputs and also to check agreement 
with the result of the Accept output. 

The worst case relative skew for received messages occurs when there are asymmetric eligible 
voters in process P3 at trustworthy RMU nodes. The bound on the relative skew of the Accept 
outputs in process P3 at the trustworthy RMUs is Jt P3A . The additional uncertainty in the 
reception delay measured from the time of the Accept outputs in process P3 to the time of 
reception in process P4 is e PP + [(1 + p 0 ) - 1/(1 + p 0 )]B P3 . 

n P4 , RC V = L(1 + Po)(tp4,RCV,h - tp4,RCV,l)J 

= L(1 + po){ttp3,A + 6PP + [(1 + Po) ' 1/(1 + Po)]B P3 }J 

= L(1 + po){3e PP + [(1 + po) - 1/(1 + Po)](A P 2 + B p2 + A p3 + B p3 )}J (C.64) 


C.6.4. Relative skew of the Accept outputs for process P4 

Let AEV_P4 denote the set of asymmetric RMU eligible voters in process P4 at a trustworthy 
BIU node. Let 7t P4 A denote the bound on the real-time relative skew of the Accept outputs in 
process P4 at the trustworthy BIUs. A P4 denotes the delay (in local-clock ticks) of the 
Computation Process in process P4 measured from the local time of reception of the selected 
message to the local time when the Accept output is asserted. If IAEV_P3I = 0 for each 
trustworthy RMU node, the trustworthy BIU nodes may have asymmetric RMU nodes in their 
sets of eligible voters for process P4 (i.e., IAEV_P4I 0 for some trustworthy BIU nodes). In this 
case, the BIU nodes accept within the time range delimited by messages from trustworthy RMU 
nodes. 

?t P 4, A l|AEV_P4l*0 ~ 7tP3,Al|AEV_P3l = 0 + 6pp + [(1 + Po) “ 1/(1 + Po)](Bp 3 + A P4 ) 

= 2e PP + [(1 + po) - 1/(1 + po)](A P3 + B p3 + A p4 ) (C.65) 

If IAEV_P3I 0 for some trustworthy RMU nodes, the trustworthy BIU nodes do not have 
asymmetric RMU nodes in their sets of eligible voters for process P4 (i.e., IAEV_P4I = 0 for each 
trustworthy BIU node). In this case, the BIU nodes essentially accept on the same message. 

ttp 4 ,Al|AEV_P4l = 0 = 6PP +1(1 + P()) " 1/(1 + Po)]A p4 (C.66) 

From this point on, unless otherwise stated: 

ttp 4 ,A = tt P4 , A l max = ttp4.Al|AEV_P4l * 0 (C.67) 
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C.7. Synchronization capture 

Figure C.8 illustrates the detailed message flow graph for the synchronization-capture stages 
in a 3x3 system. The nodes executing processes P3C and P4C are called recovering nodes. 



Stage 3 


Stage 4 


Figure C.8: Detailed message flow graph for synchronization-capture stages in a 3x3 system 


C.7.1. Bound on the observed relative skew of received messages for process P3C 

Let n P3CiRC v denote the bound on the relative skew observed in process P3C for the received 
messages from process P2 at trustworthy BIUs. n P3CRCV is measured in local clock ticks. 
FIp3c,rcv is used to check for agreement among the received inputs and also to check agreement 
with the result of the Accept output. 

The nodes in process P3C receive ECHO messages from process P2 at trustworthy BIUs in 
the same real time range as nodes executing process P3. Therefore: 

np3C.RCv — n P3 , R cv (C.68) 

C.7.2. Relative skew of the Accept outputs for process P3C 

Recovering RMU nodes synchronize using the ECHO messages from process P2 of the 
Synchronization Preservation protocol. Because the recovering nodes may have asymmetric 
faulty nodes in their sets of eligible voters, all we know is that they will accept within the time 
range delimited by ECHO messages from trustworthy BIU nodes. Let 71 p3 c,a denote the bound on 
the real-time relative skew of the Accept outputs in process P3C at the good recovering RMUs. 
Let A p3C denote the delay (in local-clock ticks) of the Computation Process in process P3C 
measured from the local time of reception of the selected message to the local time when the 
Accept output is asserted. We assume that the delay of the Computation Process in process P3C 
is the same as in process P3. 

A P3C = A p3 (C.69) 

The worst-case real-time relative skew occurs when the trustworthy BIU nodes and the 
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recovering RMU nodes simultaneously have asymmetric nodes in their sets of eligible voters. 
For that case: 


7tp3C,A - ?tp 2 ,A + EPP + [(1 + Po) - 1/(1 + Po)](B P 2 + A P 3c) 

= 3e PP + [(1 + po) - 1/(1 + po)](Api + Bpi + A P 2 + B P 2 + A P 3 ) (C.70) 


C.7.3. Bound on the observed relative skew of received messages for process P4C 

Let n P 4 c,Rcv denote the bound on the maximum relative skew observed in process P4C for the 
received messages from process P3 at trustworthy RMUs. n P4C , RCV is measured in local clock 
ticks. n P4 c,Rcv is used to check for agreement among the received inputs and also to check 
agreement with the result of the Accept output. 

The nodes executing process P4C receive ECHO messages from process P3 at trustworthy 
RMUs in the same time range as nodes executing process P4. Therefore: 

FIp4C,RCV = n P 4,RCV (C.71) 

C.7.4. Relative skew of the Accept outputs for process P4C 

Recovering BIU nodes synchronize using the ECHO messages from process P3 of the 
Synchronization Preservation protocol. Because the recovering BIUs may have asymmetric 
faulty nodes in their sets of eligible voters, all we know is that they will accept within the time 
range delimited by ECHO messages from trustworthy RMU nodes. Let 7t P4CiA denote the bound 
on the real-time relative skew of the Accept outputs in process P4C at the good recovering BIUs. 
Let A p4C denote the delay (in local-clock ticks) of the Computation Process in process P4C 
measured from the local time of reception of the selected message to the local time when the 
Accept output is asserted. We assume that the delay of the Computation Process in process P4C 
is the same as in process P4. 

A P 4c = A P4 (C.72) 

The worst-case real-time relative skew occurs when the trustworthy RMU nodes and the 
recovering BIU nodes simultaneously have asymmetric nodes in their sets of eligible voters. For 
that case: 


7tp4C,A - 7Ip3,A + Cpp + [(1 + po) - 1/(1 + Po)](B p3 + A p4c ) 


- 3e PP + [(1 + po) - 1/(1 + po)](Ap 2 + Bp 2 + Ap 3 + B P 3 + A P4 ) 


(C.73) 


C.8. Resetting the local time 


C.8.1. Relative skew of the local-time reset for process P4 

Let T P4 A ,j denote the local time of the Accept output in process P4 at trustworthy BIU j. H P4 
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denotes the synchronization-reset delay applied by the BIU nodes resetting with respect to the 
Accept output in process P4. T P4 , H ,j denotes the local time at which the next cycle begins for BIU 
node j synchronizing with respect to process P4. 

Tp4,H,j = Tp4 jA ,j + Hp4 (C.74) 

7t P4jH denotes the bound on the relative skew of the local-time reset for BIU nodes synchronizing 
with respect to process P4. Then: 

ttp 4 ,H = ttp4,A + [(1 + po) - 1/(1 + Po)]H p4 


- 2e PP + [(1 + po) - 1/(1 + po)](Ap 3 + Bp3 + A P 4 + H P4 ) 


(C.75) 


C.8.2. Relative skew of the local-time reset for process P4C 

Let Tp 4 c,A,j denote the local time of the Accept output in process P4C at a good recovering BIU 
j • H P4C denotes the synchronization-reset delay applied by the nodes resetting with respect to the 
Accept output in process P4C at the good recovering BIUs. T P4C ,H,j denotes the local time at 
which the next cycle begins for BIU node j synchronizing with respect to process P4C. The BIU 
nodes executing process P4C apply the same synchronization-reset delay as the nodes executing 
process P4. 

H P4C = H P4 (C.76) 


So: 


Tp4C,H,j — Tp4C,A,j + Hp 4c - T P4c ,A,j + H P4 (C.77) 

The bound on the relative skew of the Accept output for process P4C at the good recovering BIUs 
is given by 7I p4 c,a- Jtp 4 c,H denotes the bound on the relative skew of the local-time reset for good 
recovering BIU nodes synchronizing with respect to process P4C. Then: 

ftp4C,H — 7tp4C,A + [(1 + Po) ' 1/(1 + po)]Hp 4 


— 3e P p + [(1 + po) - 1/(1 + po)](Ap 2 + Bp 2 +Ap 3 + B P 3 + A P4 + H P4 ) (C.78) 


C.8.3. Reset delay for process P3 

Let T P3 ,A,i denote the local time of the Accept output in process P3 at trustworthy RMU i. H P3 
denotes the synchronization-reset delay applied by the nodes resetting with respect to the Accept 
output in process P3. T F3 H l denotes the local time at which the next cycle begins for RMU i 
synchronizing with respect to process P3. 

Tp3,H,i = Tp3,A,i + Hp3 (C.79) 

H P 3 is the expected delay from the time when the RMU nodes in process P3 assert their Accept 
output until the BIU nodes synchronizing with respect to process P4 reset their local-time clocks. 
The bound on the relative skew of the Accept outputs in process P3 at the trustworthy RMUs is 
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given by 7i P3 A - t P3>A|1 and t P3 Ah denote the earliest and latest real times, respectively, at which the 
Accept outputs can be asserted in process P3 at the trustworthy RMUs. So: 

7tp3,A = t P3jA ,h - t P3 , Aj i (C.80) 

t p4j j|lp3A and t P4Hh l P3 A denote the earliest and latest real times, respectively, at which a 
trustworthy BIU node synchronizing with respect to process P4 can reset its local-time clock. 
t P 4,H,ilp3,A and t P4 ,H.hlp3,A are measured with respect to the Accept outputs in process P3 at the 
trustworthy RMUs. 

tp4,H,llp3,A = tp3,A,l + T PP ,1 + (B p3 + A p4 + H P 4)/(1 + Po) (C.81) 

tp4,H,hlp3,A = tp3.A,h + r PP,h + (1 + Po)(Bp3 + A P4 + H P4 ) (C.82) 

Let h P3j! denote the minimum effective delay from the time the Accept output in process P3 at 
trustworthy RMUs is asserted to the time a trustworthy BIU node resets its local-time clock with 
respect to process P4. 

hp3,l — t P 4,H,llp3,A ' t P3 , A ,h 


— r pp,i + (B p3 + A p4 + H p4 )/(1 + Po) - 7tp3,A (C.83) 

h P3 h denotes the maximum effective delay from the time the Accept output in process P3 at 
trustworthy RMUs is asserted to the time a trustworthy BIU node resets its local-time clock with 
respect to process P4. 

hp3,h = tp4,H,hlp3,A ' t P3jA ,l 

= 7tp3,A + fpp.h + (B P3 + A P4 + H p4 )(1 + po) (C.84) 

H p3 is given by: 


H p3 - IMP(h P31 , h P3 . h ) (C.85) 

The real-time error for T P3 Hi is bounded as follows. A trustworthy BIU node can reset its local- 
time clock with respect to process P4 no earlier than p P3 , H ,i nominal ticks from local time T P3 H i at 
a trustworthy RMU node synchronizing with respect to process P3. 

Pp3,h,i = (1 + Po)H p3 - h P33 (C.86) 

A trustworthy BIU node can reset its local-time clock with respect to process P4 no later than 
Pp 3 ,H,h nominal ticks from local time T P3 H | at a trustworthy RMU node synchronizing with respect 
to process P3. 

Pp3,H.h = h P3j h - H p3 /(1 + po) (C.87) 

Note that this analysis also applies to the real-time error for T P3 Hi with respect to the local-time 
reset of nodes synchronizing in process P4C. 
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C.8.4. Relative skew of the local-time reset between processes P3, and P4 or P4C 

Let 7tp3_ P 4 H denote the bound on the relative skew of the local-time reset between RMU nodes 
synchronizing with respect to process P3 and BIU nodes synchronizing with respect to process 
P4. 


ftp3-P4,H - max(Pp3 H,l, ftp3,H,h) (C.88) 

7tp3.p4c,H denotes the bound on the relative skew of the local-time reset between RMU nodes 
synchronizing with respect to process P3 and BIU nodes synchronizing with respect to process 
P4C. ttp 3 -p 4 ,H also applies here. 

7tp3-P4C,H = %3-P4,H (C.89) 


C.8.5. Relative skew of the local-time reset for process P3 

The bound on the relative skew of the Accept outputs in process P3 at trustworthy RMUs is 
given by 7t P 3, A - 7tp 3 , H denotes the bound on the relative skew of the local-time reset for trustworthy 
RMU nodes resetting with respect to the Accept output in process P3. 

7tp3,H — ?tp3,A + [(1 + Po) ' 1/(1 + Po)]H p3 


— 2e PP + [(1 + po) - 1/(1 + po)](Ap 2 + B P 2 + Ap 3 + H P 3) 


(C.90) 


C.8.6. Relative skew of the local-time reset for process P3C 

Let Tp 3 c,A.i denote the local time of the Accept output in process P3C at good recovering RMU 
i. H P 3 c denotes the synchronization-reset delay applied by the RMU nodes resetting with respect 
to the Accept output in process P3C. T P3C ,H,i denotes the local time at which the next cycle begins 
for RMU node i synchronizing with respect to process P3C. The RMU nodes executing process 
P3C apply the same synchronization-reset delay as the RMU nodes executing process P3. 

Hp 3C = Hp3 (C.91) 


So: 


Tp3C,H,i - Tp 3 C,A,i + Hp 3 c — T P 3 C Aji + H P3 (C.92) 

The bound on the relative skew for the Accept outputs in process P3C at the good recovering 
RMUs is given by 7t P 3c, A . 7I P 3 c,h denotes the bound on the relative skew of the local-time reset for 
the nodes synchronizing with respect to the Accept output in process P3C. 

7tp3C,H — ttp3C,A + [(1 + Po) ' 1/(1 + Po)]H p3 

= 3e PP + [(1 + po)- 1/(1 + po)](A P1 + B P1 + A P 2 + B P 2 + A P 3 + H P 3 ) (C.93) 


132 



C.8.7. Reset delay for process P2 

Let T p 2 ,A,k denote the local time of the Accept output in process P2 at trustworthy BIU k. H P2 
denotes the synchronization-reset delay applied by the BIU nodes resetting with respect to the 
Accept output in process P2. T P2 . H .k denotes the local time at which the next cycle begins for BIU 
node k synchronizing with respect to process P2. 

Tp2,H,k = Tp2,A,k + H p2 (C.94) 

H p2 is the expected delay from the time when the BIU nodes executing process P2 assert their 
Accept outputs until the RMU nodes synchronizing with respect to process P3 reset their local- 
time clocks. t P3 Hjlp2,A denotes the earliest real time at which a trustworthy RMU node 
synchronizing with respect to process P3 can reset its local-time clock, measured with respect to 
the Accept outputs in process P2 at the trustworthy BIUs. 

tp3,H,llp2,A = tp 2 ,A,l + fpp.l + (B P2 + A P 3 + H P 3)/(1 + Po) (C.95) 

Let t P3 H , h lp2,A denote the latest real time at which a trustworthy RMU node synchronizing with 
respect to process P3 can reset its local-time clock, measured with respect to the Accept outputs 
in process P2 at the trustworthy BIUs. 

tp3,H,hlp2,A = tp2,A,h + fpp.h + (B P 2 + A P 3 + H P 3)(1 + po) (C.96) 

Let h P2j i denote the minimum effective delay from the time a trustworthy BIU node in process P2 
asserts its Accept output until a trustworthy RMU node in process P3 resets its local-time clock. 

hp 2 ,l = tp3 H,|lp 2 ,A - tp2,A,h 


- fpp,l + (Bp2 + Ap3 + H p3 )/(1 + po) - ttp2,A (C.97) 

Let h P2 ,h denote the maximum effective delay from the time a trustworthy BIU node in process P2 
asserts its Accept output until a trustworthy RMU node in process P3 resets its local-time clock. 

hp 2 ,h = tp3,H,hlp 2 ,A - tp 2 ,A,l 

= ?tp2,A + fpp,h + (Bp2 + Ap3 + HpjX 1 + po) (C.98) 

H P2 is given by: 


H p2 - IMP(hp 2 ,i , h P2 ,h) (C.99) 

The real-time error for Tp 2 , H ,k is bounded as follows. A trustworthy RMU node in process P3 can 
reset its local-time clock no earlier than p,p2,H,i nominal ticks from local time T P2H .k at a BIU node 
synchronizing with respect to process P2. 

ltp2,H.l = (1 + Po)H P2 - hp2,l (C. 100) 

A trustworthy RMU node in process P3 can reset its local-time clock no later than p P2Hh nominal 
ticks from local time Tp 2 , H ,k at a BIU node synchronizing with respect to process P2. 
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ftp2,H,h ~ hp 2 ,h ' H P2 /(1 + po) 


(C.101) 


Note that this analysis also applies to the real-time error for T P2 ,H,k with respect to the local-time 
reset of nodes synchronizing with respect to process P3C. 


C.8.8. Relative skew of the local-time reset between processes P2, and P3 or P3C 

Let tt p2 „p 3 H denote the bound on the relative skew of the local-time reset between trustworthy 
BIU nodes synchronizing with respect to process P2 and trustworthy RMU nodes synchronizing 
with respect to process P3. 

ttp2-P3,H = max(pp 2j H,i , flp 2 ,H,h) (C. 102) 

TCp 2 -p 3 c,h denotes the bound on the relative skew of the local-time reset between trustworthy BIU 
nodes synchronized with respect to process P2 and good recovering RMU nodes synchronized 
with respect to process P3C. Jt P2 - P 3 ,H also applies here. 

ttp2-P3C,H = ttp2-P3,H (C.103) 


C.8.9. Relative skew of the local-time reset for process P2 

The bound on the relative skew of the Accept outputs in process P2 at trustworthy BIUs is 
given by Jt P2iA . 71 p2 ,h denotes the bound on the relative skew of the local-time reset for trustworthy 
BIU nodes synchronizing with respect to process P2. Then: 

ftp 2 ,H = 7tp2,A + [(1 + Po) ' 1/(1 + Po)]H p2 

= 2e PP + [(1 + po) - 1/(1 + p 0 )](A P1 + B P1 + A p2 + H P2 ) (C. 104) 


C.8.10. Relative skew of the local-time reset for a set including processes P2 and P3C 

Let tp 3 c,H,ilp 2 ,A denote the earliest real time at which a good recovering RMU node 
synchronizing with respect to process P3C can reset its local-time clock, measured with respect to 
the Accept outputs in process P2 at trustworthy BIUs. 

tp3C,H,llp2,A = t P2 ,A,l + r PP j + (B P2 + A P 3 + H P 3 )/( 1 + po) (C. 105) 

tp 3 c,H,hlp 2 ,A denotes the latest real time at which a good recovering RMU node synchronizing with 
respect to process P3C can reset its local-time clock, measured with respect to the Accept outputs 
in process P2 at trustworthy BIUs. 

tp3C,H,hlp2,A = t P2 ,A,h + r PP ,h + (1 + po)(B P2 + A P 3 + Hp?) (C. 106) 

tp2,H,i denotes the earliest real time at which a trustworthy BIU node synchronizing with respect to 
process P2 can reset its local-time clock. 

tp2,H,l = tp2,A,l + H p2 /(1 + po) (C. 107) 
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tp2,H,h denotes the latest real time at which a trustworthy BIU node synchronizing with respect to 
process P2 can reset its local-time clock. 

tp2,H,h = tp2,A,h + Hp 2(1 + po) (C. 108) 

7tp 2+ P3c,H denotes the bound on the relative skew of the local-time reset for a node set including all 
the trustworthy or good recovering nodes synchronizing with respect to process P2 or P3C. 

7tp2+P3C,H — max(l tp3 C-Hjh lp2 i A - tp2,H,l I, I t P2i H,h - t P 3c,H,llp2,A ^P2,H< %3 C,h) (C. 109) 


C.8.11. Relative skew of the local-time reset for a set including processes P2 and P3 

Let 7 I p2+ p3,h denote the bound on the relative skew of the local-time reset for a node set 
including all trustworthy BIU nodes synchronizing with respect to process P2 or P3. With respect 
to process P2, the Accept outputs in process P3 at the trustworthy RMUs and the Accept outputs 
in process P3C at the good recovering RMUs can be asserted during the same real time interval. 
In the presence of asymmetric faulty BIU nodes, we know that the time interval of the Accept 
outputs in process P3 at the trustworthy RMUs is contained within the time interval of the Accept 
outputs in process P3C at the good recovering RMUs. Therefore: 

7tp2+P3,H - 7tp2+P3C,H (C . 1 1 0) 


C.8.12. Relative skew of the local-time reset for a set including processes P2 and P4C 

Let tp 4 c,Hjlp 2 ,A denote the earliest real time at which a good recovering BIU node synchronizing 
with respect to process P4C can reset its local-time clock, measured with respect to the Accept 
outputs in process P2 at the trustworthy BIUs. 

tp4C,H,llp2,A = tp 2 ,A,l + 2r PP ,i + (B P2 + A P 3 + B P 3 + A P 4 + H P 4)/(1 + po) (C. Ill) 

tp 4 c,H,hlp 2 ,A denotes the latest real time at which a good recovering BIU node synchronizing with 
respect to process P4C can reset its local-time clock, measured with respect to the Accept outputs 
in process P2 at the trustworthy BIUs. 

tp4C,H,hlp2,A = tp2,A,h + 2r PP ,h + ( 1 + po)(Bp2 + A P 3 + B P 3 + A P 4 + H P 4 ) (C. 1 12) 

7tp2+P4c,H denotes the bound on the relative skew of the local-time reset for a node set including all 
BIU nodes synchronizing with respect to process P2 or P4C. 

ttp2+P4C,H = max(l tp4c,H,hlp2,A " tp2,H,l I, I tp2,H,h “ tp4C,H,llp2,A I, tt P 2,H, %4 C,h) (C. 113) 


C.8.13. Relative skew of the local-time reset for a set including processes P2 and P4 

Let 7 Ip2+p4,h denote the bound on the relative skew of the local-time reset for a node set 
including all trustworthy BIU nodes synchronizing with respect to process P2 or P4. With respect 
to process P2, the Accept outputs in process P4 at the trustworthy BIUs and the Accept outputs in 
process P4C at the good recovering BIUs can be asserted during the same real time interval. In 
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the presence of asymmetric faulty RMU nodes, we know that the time interval of the Accept 
outputs in process P4 at the trustworthy BIUs is contained within the time interval of the Accept 
outputs in process P4C at the good recovering BIUs. Therefore: 

ftp2+P4,H ^ ftp2+P4C,H (C. 1 14) 


C.8.14. Relative skew of the local-time reset for a set including processes P3 or P3C 

Let 7tp3+p3c,H denote the bound on the relative skew of the local-time reset for a node set 
including all trustworthy or good recovering RMU nodes synchronizing with respect to process 
P3 or P3C. With respect to process P2, the Accept outputs in process P3 at the trustworthy 
RMUs and the Accept outputs in process P3C at the good recovering RMUs can be asserted 
during the same real time interval. This interval is determined by the time range during which the 
trustworthy BIU nodes executing process P2 send ECHO. In the presence of asymmetric faulty 
BIU nodes, the good recovering RMU nodes executing process P3C may not be able to 
synchronize any better than the duration of this time range. 

ftp3+P3C,H = max(7tp3,H, 7Ip3C,h) = ftp3C,H (C. 1 15) 


C.8.15. Relative skew of the local-time reset for a set including processes P3 and P4C 

Let tp 4 c,H,ilp 3 ,A denote the earliest real time at which a good recovering BIU node synchronizing 
with respect to process P4C can reset its local-time clock, measured with respect to the Accept 
outputs in process P3 at the trustworthy RMUs. 

tp4C,H,llp3,A — tp3,A,l + r PP,l + (Bp3 + A P4 + H P4 )/( 1 + p 0 ) (C. 116) 

tp 4 c,H,hlp 3 ,A denotes the latest real time at which a good recovering BIU node synchronizing with 
respect to process P4C can reset its local-time clock, measured with respect to the Accept outputs 
in process P3 at the trustworthy RMUs. 

tp4C,H,hlp3,A — tp3,A,h + r PP,h + (1 + Po)(Bp3 + A P4 + H P4 ) (C. 1 17) 

tp 3 ,H,ilp 3 ,A denotes the earliest real time at which a trustworthy RMU node synchronizing with 
respect to process P3 can reset its local-time clock, measured with respect to the Accept outputs 
in process P3 at the trustworthy RMUs. 

tp3,H,llp3,A = tp3,AJ + H p3 /(1 + po) (C. 1 18) 

tp 3 ,H,hlp 3 ,A denotes the latest real time at which a trustworthy RMU node synchronizing with 
respect to process P3 can reset its local-time clock, measured with respect to the Accept outputs 
in process P3 at the trustworthy RMUs. 

tp3,H,hlp3,A = tp3,A,h + H P3 (1 + po) (C. 1 19) 

ftp 3 +p 4 c,H denotes the bound on the relative skew of the local-time reset for a node set including all 
trustworthy or good recovering nodes synchronizing with respect to process P3 or P4C. 


136 



ftp3+P4C,H - max(l t P4c ,H,hlp3,A “ tp3,H,|lp3,A L I tp3,H,hlp3,A “ tp4C,H,llp3,A !• ftp3,Hf ftp4C,H) 


(C.120) 


C.8.16. Relative skew of the local-time reset for a set including processes P3 and P4 

Let 7tp3 +P4H denote the bound on the relative skew of the local-time reset for a node set 
including all BIU nodes synchronizing with respect to process P3 or P4. With respect to process 
P3, the Accept outputs in process P4 at the trustworthy BIUs and the Accept outputs in process 
P4C at the good recovering BIUs can be asserted during the same real time interval. In the 
presence of asymmetric faulty RMU nodes, we know that the time interval of the Accept outputs 
in process P4 at the trustworthy BIU nodes is contained within the time interval of the Accept 
outputs in process P4C at the good recovering BIU nodes. Therefore: 

ftp3+P4,H - 7 tp3+P4C,H (C. 1 2 1 ) 


C.8.17. Relative skew of the local-time reset for a set including processes P3C and P4C 

Let tp 3CH1 lp 2 ,A denote the earliest real time at which a good recovering RMU node 
synchronizing with respect to process P3C can reset its local-time clock, measured with respect to 
the Accept outputs in process P2 at the trustworthy BIUs. 

tp3C,H,llp2,A = tp2,A,l + r PP,l + (Bp2 + A P3 + H P3 )/(1 + p 0 ) (C. 122) 

tp 3 c,H,hlp 2 ,A denotes the latest real time at which a good recovering RMU node synchronizing with 
respect to process P3C resets its local-time clock, measured with respect to the Accept outputs in 
process P2. 

tp3C,H,hlp2,A — tp2,A,h + r PP,h + (1 + Po)(Bp2 + A P3 + H P3 ) (C. 123) 

tp4c,H,ilp2,A denotes the earliest real time at which a good BIU node synchronizing with respect to 
process P4C can reset its local-time clock, measured with respect to the Accept outputs in process 
P2 at the trustworthy BIUs. 

tp 4 c,H,ilp 2 ,A = tp 2 ,A,i + 2r PP j + (B p2 + A p3 + B p3 + A p4 + H p4 )/(1 + po) (C. 124) 

tp 4 c,H,hlp 2 ,A denotes the latest real time at which a good recovering BIU node synchronizing with 
respect to process P4C can reset its local-time clock, measured with respect to the Accept outputs 
in process P2 at the trustworthy BIUs. 

tp4C,H,hlp2,A — t P 2,A,h + 2r PP _h + ( 1 + po)(B P 2 + A P3 + B P3 + A P4 + H P4 ) (C. 125) 

7tp3c+p4c,H denotes the bound on the relative skew of the local-time reset for a node set including 
all good recovering nodes synchronizing with respect to process P3C or P4C. 

ftp3C+P4C,H = max(l t P4 c,H,hl P2 ,A " t P 3C,H,llp2.A I tp3C,H,hlp2,A " tp4C,H,llp2,A lj 7tp3C.Hb ftp4C, h) (C.126) 
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C.8.18. Relative skew of the local-time reset for a set including processes P4 and P4C 

Let 7tp 4+P 4c,H denote the bound on the relative skew of the local-time reset for a node set 
including all trustworthy or good recovering BIUs synchronizing with respect to process P4 or 
P4C. With respect to process P3, the Accept outputs in process P4 at the trustworthy BIUs and 
the Accept outputs in process P4C at the good recovering BIUs can be asserted during the same 
real time interval. This time range is determined by the time range during which the trustworthy 
RMU nodes executing process P3 send their ECHO messages. In the presence of asymmetric 
faulty RMU nodes, the good recovering BIUs executing process P4C may not able to synchronize 
any better than the duration of this time range. 

Ttp4+P4C,H = max(7Ip4,H , 7I P 4c,h) = ttprc.H (C. 127) 


C.8.19. Relative skew of the local-time reset for a set including all the synchronizing nodes 

Let JtAu.,H denote the upper bound on the relative skew of the local-time reset for all the 
trustworthy or good recovering nodes executing the synchronization protocol. The following 
relations allow us to reduce the number of relative skews that must be considered: 


7tp2+P3C,H - 7tp2,H and 7tp2+P3C,H - 7tp3C,H 

(C.128) 

ttp2+P3C,H ^ ttp2+P3,H 

(C.129) 

ttp2+P4C,H ^ ttp2,H and 7tp2+P4C,H ^ ttp4C,H 

(C.130) 

7tp2+P4C,H — 7 tp2+P4,H 

(C. 131) 

ttp3+P4C,H ^ ttp3,H and 7tp3+P4C,H ^ ttp4C,H 

(C.132) 

7tp3+P4C,H ^ ttp3+p4,H 

(C.133) 

ftp3C+P4C,H - 7tp3C,H and 7tp3C+P4C,H - 7t P4C,H 

(C.134) 


So: 

TCaLL.H = niax(7tp2 + P3C,H> 7tp2+P4C,H, 7t P3+P4C,H 5 TCp3C+P4C,h) (C. 135) 


C.9. Relative local-time skews for source-receiver pairs 


C.9.1. Duration of the synchronization protocol execution 

From global perspective, the execution of the synchronization protocol ends when all the 
trustworthy and good recovering nodes have reset their local-time clocks. t sync ,i and t syn c,h denote 
the earliest and latest times, respectively, at which a trustworthy BIU node begins to execute the 
synchronization protocol. 7t P o denotes the bound on the relative local-time skew for the 
trustworthy BIU nodes executing process PO. 
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HpO tsync.h " tgync,! 


(C.136) 


t S ync,P2,H,i and t sync ,p2,H,h denote the earliest and latest times, respectively, at which a trustworthy 
BIU node synchronizing with respect to process P2 can reset its local-time clock. 


lsync.P2.H.l tsync.l + 2r PP j + (Bpo + Api + Bpi + A P 2 + Hp 2)/(1 + po) 
tsync,P2,H>h ^sync,h + 2r PPjh + (1 + po)(B PO + A P1 + B P1 + A P2 + H p2 ) 


(C.137) 

(C.138) 


t S ync.P3.H,i and t sync , P3 ,H,h denote the earliest and latest times, respectively, at which a trustworthy 
RMU node synchronizing with respect to process P3 can reset its local-time clock. 


tsync,P3.H,l — kync.i + 3r PP j + (Bpo + A P1 + B P1 + A P 2 + B P 2 + A P 3 + H P 3)/(1 + po) 
t-sync,P3,H>h ^sync,h + 3r PP2l + (1 + po)(B P o + A P1 + B P1 + A P2 + B P2 + A P3 + H P3 ) 


(C.139) 

(C.140) 


t S ync,P4.H,i and t sync P4H ,h denote the earliest and latest times, respectively, at which a trustworthy 
BIU node synchronizing with respect to process P4 can reset its local-time clock. 

hync.P4.H.I hync.l 4t'pp] 

+ (B P o + A P i + B P i + A p2 + B p2 + A p3 + B p3 + A P 4 + H p4 )/(1 + po) (C.141) 

lsync.P4.H.h — hync.h + 4r PPj h 

+ (1 + po)(Bpo + A P1 + B P1 + A p2 + B p2 + A p3 + B p3 + A p4 + H p4 ) (C. 142) 

tsync,P3c,H,i and t synCiP3 c.H.h denote the earliest and latest times, respectively, at which a good 
recovering RMU node synchronizing with respect to process P3C can reset its local-time clock. 


tsync,P3C,H,l — t sy nc.P3,H,l 


u 


sync,P3C,H,h ~ kync.P3.H.h 


(C.143) 

(C.144) 


tsync,P4c,H,i and t synCiP4C ,H.h denote the earliest and latest times, respectively, at which a good 
recovering BIU node synchronizing with respect to process P4C can reset its local-time clock. 


t S ync,P4C,H,l — tsync,P4,H,l 
t S ync,P4C,H,h = tsync,P4,H,h 


(C.145) 

(C.146) 


Kall denotes the bound on the relative local-time skew for all the nodes participating in the 
execution of the synchronization protocol. The calculation of Hall does not include the nodes 
executing the synchronization-capture processes. 8 sync l mm and 5 sync l max denote lower and upper 
bounds, respectively, on the real-time duration of the execution of the synchronization protocol 
for the trustworthy nodes. 5 sync l min is measured from the latest time at which a trustworthy node 
begins to execute the protocol to the earliest time at which a trustworthy node resets its local-time 
clock. We choose the following value for 5 sync lmm: 
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^synJmin - (^ALL " ^tpo) 111 i tl [ (t S y nc p2.H.I " Lync.hX (tsync,P3,H,l - bync.h)? (tsync,P4,H,l " hync,h)l (C. 147) 

SsyncUax is measured from the earliest time at which a trustworthy node begins to execute the 
protocol to the latest time at which a trustworthy node resets its local-time clock. We choose the 


following value for 8 sync lmax: 

8synJ max (^ALL " ^tpo) Uiax[ (t S y nc p2.H.li " txync.lb dsync.P3.H h " Lyric. if (bync,P4,H,h " Lync.l)] (C. 148) 

We define the following variables in order to simplify these expressions for and 8 sync l max . 

A sy nc,P2,H,i = 2r PPj i + (B P o + A P i + B P i + A P 2 +H P 2)/(1 + po) (C. 149) 

A sy nc.P3.H,i = 3r PP i + (B P0 + A P1 + B P1 + A P2 + B P2 + A P3 +H P3 )/(1 + p 0 ) (C. 150) 

A sy nc,P4.H,i — 4r PP3 + (B P0 + A P1 + B P1 + A P2 + B P2 + A P3 + B P3 + A P4 +H P4 )/(1 + po) (C. 15 1) 

A sy nc,P2,H,h — 2r PP _h + (1 + Po)(B PO + A P1 + B P1 + A P2 +H p2 ) (C. 152) 

A sy nc.P3.H,h — 3r PP _h + (1 + Po)(B po + A P1 + B P1 + A P2 + B P2 + A P3 +H P3 ) (C. 153) 

A sy nc,P4.H,h = 4r PP ,h + (1 + po)(B P o + A P i + B P j + A P 2 + B P 2 + A P3 + B p3 + A P4 +H p4 ) (C. 154) 

Then: 

8xynJ min — - ^ALL "t" 111 i lit A sync p 2 | J | , A syncP3P [ 4 , A sync4 i 4 f| |J (C.155) 

8xyncl max ^ALL I^^^(^sync,P2,H,h ? ^sync,P3,H,h ? ^sync,P4,H, h) (C.156) 


C.9.2. Bounds on the resynchronization period 

Let 8s P Un and 8s P l ma x denote the values of 8 sync l min and 8 sync l max , respectively, for the 
Synchronization Preservation protocol. T SP denotes the scheduled local time to begin the 
execution of the Synchronization Preservation protocol. p mm denotes a lower bound on the real- 
time duration of a synchronization cycle. p mm is measured from the time of the synchronization 
reset in one cycle to the time of the synchronization reset in the next. 

Pmin - Tsp/(1 + Po) + 8s P lmin (C.157) 

Pmax denotes an upper bound for the real-time duration of a synchronization cycle. p max is 
measured from the time of the synchronization reset in one cycle to the time of the 
synchronization reset in the next. 

Pmax = (1 + P())TsP + 8 SP l ma x (C.158) 

P denotes the nominal resynchronization period for the analysis of relative skews. P is measured 
in units of local-clock ticks. We want a count of P local-clock ticks to be larger than the 
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maximum duration of a synchronization cycle measured in nominal ticks. This constraint is 
captured by the following expression: 


P/(l + Po)> Pma x (C.159) 

So: 

P > (1 + po)p m ax (C.160) 

We choose P to be the smallest integer that satisfies the previous inequality. 

P = f(l + P 0 )p m axl (C. 161) 


C.9.3. Relative skew between P2-synchronized BIUs and P3- or P3C-synchronized RMUs 

Let 7tp2-p3 denote the bound on the relative local-time skew during the synchronization cycle 
for trustworthy BIU nodes synchronized with respect to process P2 and trustworthy RMU nodes 
synchronized with respect to process P3. 

7tp2-P3 = 7tp2-P3,H + [(1 + Po) ' 1/(1 + Po)]P (C.162) 

ttp 2 -P 3 c denotes the bound on the relative local-time skew during the synchronization cycle for 
trustworthy BIU nodes synchronized with respect to process P2 and good recovering RMU nodes 
synchronized with respect to process P3C. Jtp 2 -P 3 also applies here. 


7tp2-P3C - 7t P2-P3 


(C.163) 


C.9.4. Relative skew between P3-synchronized RMUs and P4- or P4C-synchronized BIUs 

Let 7tp3_ P4 denote the bound on the relative local-time skew during the synchronization cycle 
for trustworthy RMU nodes synchronized with respect to process P3 and trustworthy BIU nodes 
synchronized with respect to process P4. Then: 

7tp3.p4 = 7tp3-P4,H + 1(1 + Po) - 1/(1 + Po)]P (C.164) 

7tp3.p4c denotes the bound on the relative local-time skew during the synchronization cycle for 
trustworthy RMU nodes synchronized with respect to process P3 and good recovering BIU nodes 
synchronized with respect to process P4C. 7r P3 _p4 also applies here. 


ttp3-P4C - ttp3-P4 


(C.165) 


C.9.5. Bound on the relative local-time skew for all the nodes executing the 
synchronization protocol 

7I S p,all denotes the value of 7t ALL for Synchronization Preservation. 

ftSP.ALL = TtALL.H + [(1 + po) - 1/(1 + po)]P (C.166) 
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C.9.6. Generic relative local-time skew between sources and receivers for synchronous 
communication 

For synchronized operations, we would like to use a single value of the relative local-time 
skew between sources and receivers for all point-to-point communication. 7I PPS r denotes the 
common bound on the relative local-time skew between sources and receivers for synchronized 
communication. From the preceding analysis, there are only two particular source-receiver cases 
that need to be considered to determine a common skew bound: the skew between P2- 
synchronized nodes and P3-synchronized nodes (i.e., 7t P2 - P 3), and the skew between PS- 
synchronized nodes and P4-synchronized nodes (i.e., 7t P 3_ P 4). We choose 71 pp , S r to be the largest of 
the two. 


ftpP.SR - max (ftp2-P3, TCP3-P4) 


(C.167) 


C.10. Specifying the Computation Process and Send Process delays 

A goal of this ROBUS version is to achieve nearly the same tightness for the relative local- 
time skew when executing the Synchronization Preservation, Initial Synchronization, and 
Synchronization Capture protocols. 

The Synchronization Preservation and Initial Synchronization protocols can be decomposed 
into two major phases: agreement generation and agreement propagation. The agreement 
generation phase includes the first two stages of the protocol from the Send Process in PO to the 
Computation Process in P2. In this phase, the relative skew goes from a bounded initial value 
denoted by 7t P o to a relative skew of the Accept outputs denoted by 71 P 2 ,a, which is independent of 
7t P o but dependent on the process delays. The agreement propagation phase includes the last two 
stages of the protocol from the Send Process in P2 to the Computation Process in P4, including 
the Computation Processes in P3C and P4C for the Synchronization Capture protocol. The 
synchronization-reset delays are applied with respect to the Accept outputs in processes P2, P3, 
P3C, P4, and P4C. The process delays for this second phase of the protocol are important 
determinants of the final relative local-time skew. 

The approach taken to determine the process delays for the synchronization protocols in this 
version of the ROBUS is as follows. Since we expect the value of 7t P0 to be different for the 
Synchronization Preservation and the Initial Synchronization protocols, we specify the Send 
Process delay for process PO (i.e., B P0 ) independently for each protocol according to the particular 
timing requirements of the protocol. To ensure that all the versions of the protocol achieve 
approximately the same relative skew, we compute one set of Computation Process and Send 
Process delays for the synchronization processes from PI on. These delays must be used by all 
the synchronization protocols. 

An additional consideration is the constraint on the minimum data-introduction interval (DII) 
for the send port of the Communication Module, Acomm- This constraint applies to the BIUs and 
the RMUs, and is satisfied by adding functional requirements to the Send Processes at the BIUs 
and the RMUs. The details are described next. 
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C.10.1. Computation Process delays 

The Computation Process delay is decomposed into two parts: the reception delay in the 
Receive Process and the computation delay in the Accept Process. For the case of the 
Synchronization Preservation protocol, the reception delay is the delay allocated to ensure that all 
valid messages are received before the computation begins. This delay is similar to the 
deskewing window applied for the synchronous point-to-point communication as discussed in 
Appendix B. A fundamental difference between the reception delay for the synchronization 
protocols and the deskewing window for the synchronous protocols is that in the synchronization 
protocols the relative time spacing between received messages is preserved when forwarding the 
messages to the computation, while in the synchronous protocols the messages are accumulated 
and forwarded at the same time. 

To specify the reception delay, we consider the timing of reception in the Synchronization 
Preservation protocol. The timing of reception in the Initial Synchronization protocol is not 
considered because in that protocol the uncertainty in the time of reception can be extremely 
large, especially for processes PI and P2, which would result in very large delays for the 
protocol. Having a quick execution is very important for the Synchronization Preservation 
protocol since the duration of the protocol determines how much time is available to execute the 
synchronous protocols for a given resynchronization period. 

Let Ais,pi, A is ,p 2 , A is ,p 3 , and A 1s ,p 4 denote the Computation Process delays for processes PI, P2, 
P3, and P4 of the Initial Synchronization protocol, respectively. A SPP1 , A sp ,p 2 , A S p p 3 , and A sp ,p 4 
denote the Computation Process delays for processes PI, P2, P3, and P4 of the Synchronization 
Preservation protocol, respectively. A SCP3C and A SCP4C denote the Computation Process delays 
for processes P3C and P4C of the Synchronization Capture protocol, respectively. All the 
synchronization protocols have the same Computation Process delays. 


Api — Ais.pi = Asp.pi 

(C.168) 

A p2 = A IS _p2 = A SPj p2 

(C.169) 

A p3 = A IS _P3 = A SPj p3 = A S r P3c 

(C.170) 

Ap 4 — Ais,P4 = Asp,P4 = Asp,P4C 

(C. 171) 


For process PI of the Synchronization Preservation protocol, the expected time range of reception 
is as follows (This interval includes all the clock edges at which valid messages can arrive.): 

[T SP .Pl.RCV.E - App jR cvlabs-max, Tsp.pi.rcv.E + A PPjR cvlabs-max ] (C.172) 

W SPP1 denotes the reception delay applied in process PI. For the Synchronization Preservation 
protocol, this delay must be large enough to ensure that the Accept Process receives the messages 
after the clock edges during which valid messages are expected to arrive. 

Wsp,pi = 2A PPjR cvlabs-max + 1 (C.173) 

W SPjP 2 , W SP ,P 3 , and W SPP4 are similarly defined. Let A SPi p 2 j Rcvlmax> A SPi p 3 ,R C vl m ax> an d A SPP4RCV Lax 
denote the maximum valid local time error for the time of reception of synchronization messages 


143 



in processes P2, P3, and P4 of the Synchronization Preservation protocol. These variables 
correspond to A P2 , RC vlmax, A P 3 iRC vlmax, and A P4>RC vlmax evaluated for the case of the Synchronization 
Preservation protocol. Then: 

W; S P , P 2 = 2As P , P 2,RCvlmax + 1 (C.174) 

W SP , P3 = 2A SPjP 3 iRCV lmax +1 (C.175) 

W s P , P 4 = 2 A SPjP4iRCV lmax +1 (C . 1 76) 

Notice that for process PI the expected time of reception is exactly A SP , P i, R cvlmax ticks from the 
left edge of the reception interval in that process. Similar observations apply to processes P2 
through P4. The computation delay is measured from the time the message to be selected is 
presented to the Accept Process until the Accept output is asserted. Cs P , P i. Cs P , P 2 , Cs PjP3 , and 
Cs P , P4 denote the Accept Process delays for processes PI. P2, P3, and P4, respectively, of the 
Synchronization Preservation protocol. These delays also apply to the Initial Synchronization 
and Synchronization Capture protocols. Then: 


A P i = W SP-P1 + Cs P , P i 

(C.177) 

A p2 = W SP , P2 + Cs P , P2 

(C.178) 

A P 3 = Ws P , P 3 + Cs P , P3 

(C.179) 

A p4 = W SPP4 + Cs P , P4 

(C.180) 


C.10.2. Send Process delays 

The Send Process delays must be set to ensure proper inter-process communication. The Send 
Process delay for process PO does not need to be the same for Initial Synchronization and 
Synchronization Preservation. The specification of that value for each protocol is presented 
below. For all the other Send Processes, we specify the delays based on two factors. First, we 
would like to specify the process delays based on the execution of the Synchronization 
Preservation protocol. The timing of execution of the Initial Synchronization protocol is not 
preferred because in that protocol the uncertainty in the time of reception can be extremely large, 
which would result in extremely large process delays for the protocol. The second factor when 
specifying the Send Process delays is the need to satisfy the minimum data-introduction-interval 
constraint for the send port of the Communication Module, Ar „ m , which must be satisfied at the 
BIUs and the RMUs. 

For the execution of the Synchronization Preservation protocol, the main concern in 
specifying the Send Process delays is ensuring proper coordination between the send and receive 
operations. In particular, the specification of the send delay must take into consideration the 
expected reception delay, the minimum delays in opening the input windows, and the size of the 
input windows. This is not a consideration in the Initial Synchronization protocol since in that 
case all the Computation Processes are enabled at the beginning of the execution of the protocol. 

For the Synchronization Preservation protocol, we must ensure that the time separation 
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between the sending of INIT and ECHO messages satisfies the A Co mm constraint. The preferred 
method to satisfy this constraint in the Synchronization Preservation protocol is to increase the 
Send Process delays for the INIT messages in processes PO and PI and/or the ECHO messages in 
processes P2 and P3 until sufficient separation between them is ensured. 

For the Initial Synchronization protocol, the problem is more complicated. Because the initial 
relative local-time skew can be much larger than the Computation Process and Send Process 
delays, there is no way to meet the A Co mm constraint by simply changing the process delays while 
still achieving the other design goals. The preferred solution for this case is to add functionality 
to the Send Processes at the BIUs and the RMUs to force a minimum separation between INIT 
and ECHO messages. However, the buffering of synchronization messages for a bounded but 
unspecified amount of time at a Send Process is an undesired solution because it would result in 
an increase in the bound on the relative local-time skew achieved by the protocol. Instead, the 
solution is based on the observation that for the Initial Synchronization protocol, once the 
Computation Process of a node has performed the computation that triggers the sending of an 
ECHO message (i.e., Accept(INIT) in process P2 at the BIUs, and Accept(ECHO) in process P3 
at the RMUs), there is no need for the node to send an INIT message. To understand this, notice 
that the synchronization protocol achieves synchronization in process P2, and this is then 
propagated to processes P3 and P4 using ECHO messages. For the Initial Synchronization 
protocol, RMUs and BIUs reset their local times with respect to the Accept(ECHO) outputs in 
processes P3 and P4, respectively. Therefore, the fact that an ECHO message is going to be sent 
means that whatever critical timing information was going to be provided by processes PO and 
PI, it has already been received. Therefore, the INIT messages are redundant from that point on. 
So, for Initial Synchronization, to meet the minimum data-introduction-interval constraint, the 
Send Process must have the following features: 

• The sending of an INIT message must be blocked if the message has not been sent by the 
time the Accept output that triggers the sending of an ECHO message is asserted. 

• The send delay for ECHO messages must be larger than or equal to Ac 0 mm - 1 ■ 

The first functional requirement removes redundant INIT messages. The second requirement 
ensures that, if an INIT message is sent at or before the tick at which the Accept output that 
triggers the sending of an ECHO message is asserted, then the ECHO message will be sent at 
least Acomm ticks after the INIT message. 

Let B ISjP o, B ISP1 , B isp2 , and B ISP3 denote the Send Process delays for processes PO, PI, P2, and 
P3 of the Initial Synchronization protocol, respectively. B S p,po. B S p,pi. B SP . P 2, and B sp ,p 3 denote the 
Computation Process delay for processes PO, PI, P2, and P3 of the Synchronization Preservation 
protocol, respectively. For processes PI, P2, and P3: 


= Bis.pi - Bs P , P i 

(C. 181) 

— BlS,P2 = B S PP2 

(C.182) 

= Bjs p3 = Bsp,P3 

(C.183) 


B ISP0 and B SPP0 are specified separately for Initial Synchronization and Synchronization 
Preservation. 
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C. 10.2.1. Send delay for process PO 


C. 10.2. 1.1. Synchronization Presentation 

The Synchronization Preservation protocol is a time -triggered, event-driven protocol. The 
communication between processes PO and PI follows a time -triggered pattern similar to the 
point-to-point communication of the synchronous protocols. After that operation, the rest of the 
Synchronization Preservation protocol proceeds driven by communication and processing events. 
T SP denotes the local-time trigger for the execution of the Synchronization Preservation protocol. 
B spfo denotes the send delay for process PO of the Synchronization Preservation protocol. 
Bsp,polmin denotes the minimum send delay for process PO. Bsppn Ln is assumed to be the time 
needed to prepare the message for transmission. B S p,po - B S p,polmin is additional delay added to 
align the send and receive operations. A S p,pi,rcvwnd denotes the delay from the communication 
reference time to the opening of the receive window in process PI. A SP>P i > RcvwNDlmin is the 
minimum value of A SP p 1 RCV wm> Rpp denotes the expected point-to-point reception delay. W SPP1 
is the size of the reception window. W SP>P i >pre is the pre -expectation window (i.e., the size of the 
section of the reception window before the expected time of reception). Considering the analysis 
in Appendix B, W SPiP1>pre corresponds to W Deskewpre . So: 

W SP.Pl.pre — A PPi rc v I abs-max. (C. 1 84) 

Tsp,po,snd denotes the send time for process PO. T SP P0 SND corresponds to T P0 in the general 
analysis of the clock synchronization protocols. T spp1RCVi e denotes the expected time of 
reception for process PI. T sp , P o-pi,ref denotes the reference time for the transmission between PO 
and PI. 

Tsp,po-pi,ref = Tsp (C. 185) 

Two cases must be considered. 

Case 1: Bsp.polmin + Rpp — Asp.pi.RCVWNDlmin + Wsp.pi iPre 
For this case: 

Bsp.PO — Bsp.polmin (C. 186) 

Asp.pi.rcvwnd = Bsp.polmin + Rpp “ W SPPlpre (C. 187) 

So: 

Tsp.po.snd = Tsp,po-pi,ref + Bsp.po = T S p + B SP PO l m i n (C. 188) 

And: 

Tsp,pi,rcv.e = Tsp.po.snd + Rpp = Tsp + Bsp.polmin + Rpp (C.189) 
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Case 2: Bsp.polmin + Rpp < As P , P l,RCVWNDlmin + Wgppi , pre 
For this case: 

BsP.PO — Asp.pi.RCVWNDlmin + Wsp,pi, pre - Rpp 
Asp.PI.RCVWND — ^SP,Pl,RCVWNDlmin 

So: 

Tsp,P0,SND = Tsp,P0-P1,REF + Bsp.PO = Tsp + Asp.pi.RCVWNDimin + Wsp, P l, pre - Rpp 

And: 

TsP,P 1 ,RCV,E — Tsp.P 0 .SND + Rpp = Tsp + Asp.pi.RCVWNDimin + Wsp.pi.pre 


(C.190) 
(C. 191) 


(C.192) 


(C.193) 


C. 10.2. 1.2. Initial Synchronization 

Let jc : s denote the bound on the relative local-time skew considering BIUs and RMUs during 
the execution of the Initial Synchronization protocol, measured in nominal clock ticks. T IS 
denotes the local time triggering the execution of the Initial Synchronization protocol. The 
timing of the first-stage communication can be analyzed similarly to the point-to-point 
communication for synchronous protocols. 

Tis.po-pi.ref denotes the reference time for the communication between processes P0 and PI. 
Tis.po.snd denotes the local time at which process P0 sends the message. T is , P o,snd corresponds to 
T P0 in the general analysis of the clock synchronization protocols. T is .pi,rcv.e denotes the 
expected time of reception in process PI. Bis.po denotes the Send Process delay for process P0. 
Bis.polmin denotes the minimum send delay for process P0. A is ,pi,rcvwnd denotes the delay from the 
communication reference time to the opening of the receive window in process PI. W is ,pi 
denotes the size of the reception window in process PI. W ISPljPre denotes the pre -expectation 
window in process PI (i.e., the size of the section of the reception window before the expected 
time of reception). We use T 1S as the reference time for the communication between processes P0 
and PI. 

Tis.po-pi.ref = Tis (C. 194) 

We use the analysis for point-to-point communication in Appendix B to determine W IS P1 pre . To 
determine W IS P1 , we need the maximum error in the expected time of reception for the Initial 
Synchronization protocol messages, A IS , PPi R Cv labs-max- 

AlS,PP,RCvlabs-max L(1 + poXXis + max(p PP ,i , ft PP ,h))J (C.195) 

Ppp.i and p PP , h are given in the Appendix B. So, for the reception window: 

W is . pl = 2A ISPP .R Cv labs-max + 1 (C. 196) 

Wis.pi.pre = Ais, PP ,RCvlabs-max (C. 197) 
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Bis.polmin is assumed to be the time needed to prepare the message for transmission. 

We expect the upper bound on the relative local-time skew during the execution of the first 
stage of the Initial Synchronization protocol to be much larger than any minimum timing 
constraints associated with the process of communication. Based on this, we assume that the 
following condition holds for the communication between processes PO and PI . 

Bis.polmin + Rpp < Ais.pi.RCVWNDlmin + W IS ,pi,p re (C. 198) 

For this case: 

Bis ,P0 — Ais.pi.RCVWNDlmin + Wis,Pi, pre - Rpp (C.199) 

AlS.Pl.RCVWND — Ais.pi.RCVWNDlmin (C.200) 


So: 


Tis ,P0,SND — Tis ,P0-P1,REF + Bis ,P0 - Tis + Ais.pi.RCVWNDlmin + Wis,pi, pre " Rpp (C.201) 

And: 

Tis ,P 1 ,RCV,E — Tis .PO.SND + Rpp — Tis + Ais.pi.RCVWNDlmin + Wis.Pl.pre (C.202) 

C. 10.2.2. Send delay for process PI 

B P1 is specified based on timing considerations for Synchronization Preservation. B P1 l 1Ilin is 
determined by the implementation. Process PI sends 1NIT to process P2. However, the reference 
event used to coordinate the communication between processes PI and P2 is the trigger time for 
the transmission of the message in process PO. Let T sp . P o_p2,ref denote this reference. 

Tsp,po-p2,ref = Tsp + Bsp.po (C.203) 

Rsp,po-p2 denotes the expected reception delay for process P2 of the Synchronization Preservation 
protocol. Rsp.p 0 .p 2 is measured from the send time in process PO to the expected time of reception 
in process P2. A S p,p2,rcvwnd denotes the delay from the reference time to the opening of the input 
window in process P2. A S p,P 2 .RcvwNDlmin denotes the minimum value, which is determined by the 
implementation. For proper communication, the following relation must be satisfied: 

Rsp,P0-P2 = Asp,P2,RCVWND + A SP .p 2 ,RCvlmax (C.204) 

Here, both Rsp,po-p2 and A S p,P 2 ,Rcvlmax are functions of B P i, and A S p,p2.rcvwnd can be made larger 
than Asp p? RrvwNpImjn. Solving this equation for B P i is not trivial. However, note that Rsp,po-p2 
varies one-to-one with respect to B P1 , while A S p.p 2 ,Rcvlmax changes by approximately 2 p 0 B PI for 
each unit step in B P1 . This observation allows us to use the following algorithm to determine B P1 . 
The notation Rsp,po-p2(Bpi) and A SPP 2,Rcvlmax(Bpi) highlights the dependence of Rsp,po-p2 and 
Asp.P2.RCV I max On B P1 

1. Bpi = Bpilmin 

2. while [Rsp,P0-P2(Bpi) < A SPP 2,Rcvlmax(Bpi) + A S p,p2,RCVWNDlmin] 
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3. {B P1 - B P1 + 1 } 


4. Results: 

5. B P1 

6. RsP,P 0-P2 = RsP,P 0 -P 2 (Bpi) 

7 ■ A SPP2 ,Rcvlmax = A S p,p 2 ,RCvlmax(Bpi) 

8- A sp p2 ,rcvWND = RsP,P 0-P2 " AsP,P2,RCvlmax 

C. 10.2.3. Send delay for process P2 

B P2 is specified based on timing considerations for Synchronization Preservation and the 
minimum data-introduction-interval constraint of the Communication Module. B P2 l m i n is 
determined by the implementation. T sp ,pi-p3,ref denotes the reference time for the communication 
message propagation from PI to P3. The event used to coordinate this communication is the time 
of the Accept output in process PI, denoted by T S p,pi,a for the Synchronization Preservation 
protocol. 

Tsp,pi-p3,ref = Tsp.pi.a (C.205) 

Rsp,pi-p3 denotes the expected reception delay for process P3 of the Synchronization Preservation 
protocol. Rsp,pi-p3 is measured from the time of Accept output in process PI to the expected time 
of reception in process P3. A S p,p3,rcvwnd denotes the delay from the reference time to the opening 
of the input window in process P3. Asp P 3 RrvwNn Lin denotes the minimum value, which is 
determined by the implementation. For proper communication, the following relation must be 
satisfied: 

RsP,P1-P3 = Asp,P 3,RCVWND + A SPi p3, RCV l m ax (C.206) 

As for the case of B P i, solving this equation for B P2 is non-trivial. Therefore, we use here the 
same algorithm used to solve for B P1 . An additional constraint is that B P2 must be larger than or 

equal to Acomm - T 

1. B p2 = B^lmin 

2. if (B P2 < Acomm " 1), theil B P2 — A^omm " T 

2. while [Rsp,P1-P3(Bp 2) < A S p j P 3 , R cvlmax(Bp 2 ) + Asp p 3 ,rc VWND I min] 

3. {B P2 = B P2 + 1 } 

4. Results: 

5. B P2 

6. RsP,P 1-P3 = RsP,Pl-P3(Bp 2 ) 

7 ■ A S p,p3,RCvlmax = Asp,p3,RCvlmax(Bp 2 ) 

8. A S P,P3,RCVWND = RsP,P 1-P3 “ Asp,p3,RCvlmax 


C. 10.2.4. Send delay for process P3 

B p3 is specified based on timing considerations for Synchronization Preservation and the 
minimum data-introduction-interval constraint of the Communication Module. B P3 l 1T1 i n is 
determined by the implementation. T S p,p 2 _p4,ref denotes the reference time for the communication 


149 



message propagation from P2 to P4. The event used to coordinate this communication is the time 
of the Accept output in process PI, denoted by T S p,p2,a for the Synchronization Preservation 
protocol. 

Tsp,P2-P4,REF = Tsp,p2,A (C.207) 

Rsp,p2-p4 denotes the expected reception delay for process P4 of the Synchronization Preservation 
protocol. Rsp,p2-p4 is measured from the time of Accept in process P2 to the expected time of 
reception in process P4. Let A S p,p4,rcvwnd denote the delay from the reference time to the opening 
of the input window in process P4. A S p,P 4 ,RcvwNDlmin denotes the minimum value, which is 
determined by the implementation. For proper communication, the following relation must be 
satisfied: 

Rsp,P2-P4 = Asp,p4,RCVWND + ^SP,P4,RCvlmax (C.208) 

As for the case of B P2 , solving this equation for B P3 is non-trivial. Therefore, we use here the 
same algorithm used to solve for B P2 . An additional constraint is that B P3 must be larger than or 
equal to Acomm - 1. 

L Bp 3 = Bp3lmin 

2. if (B P3 < Acomm " 1), then Bp 3 — A^omm " L 

2. while [RsP,P2-P4(Bp3) < A S p j P4,RCvlmax(Bp3) + AsP,P4,RC VWND l m in] 

3. {B p3 = Bp 3 + 1 } 

4. Results: 

5. Bp3 

6. RsP,P2-P4 = RsP,P2-P4(Bp3) 

7 ■ A S P P4 Rcv lmax = ^SP,P4,RCvlmax(Bp3) 

8. A S P P4 R C VWND = RsP,P2-P4 “ A SP ,p 4j RC V lmax 


C.ll. Miscellaneous considerations 


C.ll.l. Frame Synchronization 

The Frame Synchronization protocol is presented in Section 7 of this document. The protocol 
is executed by recovering nodes in the Synchronization Acquisition mode. The end of the Frame 
Synchronization protocol triggers the execution of the P3C or P4C synchronization-capture 
processes. An assumption for the protocol is the existence of a single valid clique in Preservation 
mode. The protocol monitors the ECFIO messages from the trusted nodes identified during Local 
Diagnosis Acquisition. Achieving frame synchronization is equivalent to finding the time gap 
between consecutive executions of the Synchronization Preservation protocol. The Frame 
Synchronization protocol consists of searching for a time interval during which the clique is not 
sending ECHO messages. Finding such interval indicates that the clique is in between 
computations of clock adjustments, and thus it is an appropriate time to start the execution of the 
Synchronization Capture protocol. The Frame Synchronization protocol can achieve 
synchronization even if, for the node executing the protocol, it is not true that a majority of the 
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eligible sources of the opposite kind is trustworthy. The time interval measured by the gap timer, 
called the frame synchronization gap, corresponds to the maximum observed relative skew 
between received ECHO messages from trustworthy nodes. The analysis presented here applies 
to BIUs and RMUs. 

Let A fs , gap denote the duration of the frame synchronization gap, measured in local clock 
ticks. A fs , gap Irmu and A FS GAP I BIU correspond to A KSCjAP for RMUs and BIUs, respectively. 

A F s,gapIrmu = n SP , P 3c,R CV (C.209) 

A F S,GApIbIU = E[sp,P4C,RCV (C.210) 

We choose A FSGAP l max to be the largest value of A FS GAP . 

A F s,GAplmax - max(n SPjP 3 CiRC v , n SPiP4C ,Rcv) (C.21 1) 

We are interested in the worst-case duration of the Frame Synchronization protocol. A FS 
denotes the actual duration of the execution of the Frame Synchronization protocol measured in 
local clock ticks. To determine the maximum duration of the Frame Synchronization protocol, 
we need to consider the possible patterns of interruption of the interval timer. 

N denotes the total number of BIU nodes, and M denotes the total number of RMUs nodes. 
Let to denote the number of eligible sources of the opposite kind. 

col max = max(N, M) (C.212) 

A FS denotes the actual duration of the execution of the Frame Synchronization protocol, 
measured in local-clock ticks. An assumption for the Frame Synchronization protocol is that, 
during its execution, it will encounter at most one execution of the Synchronization Preservation 
protocol. Therefore, a source is allowed to interrupt the interval timer at most once during the 
execution of the protocol. In the worst-case, interruptions from eligible sources can consume up 
to 0)l max *A FSiGAP l max local ticks in failed attempts to find a quiet frame synchronization gap (i.e., an 
interval with no gap timer interruptions). Adding an additional A fs , GA p for the last interval, for 
which interruptions would not be allowed, then: 

AfS I max (col 

max + l)*A F s, G Apl max (C.213) 

5 fs denotes the worst-case duration of the Frame Synchronization protocol measured in nominal 
clock ticks. 

^FS^max — (1 + po)A F sl max (C.214) 

The assumption that, during its execution, the Frame Synchronization protocol will encounter 
at most one execution of the Synchronization Preservation protocol imposes the following 
constraint on the minimum duration of the resynchronization period p niin . 

Pmin — Spslniax (C.215) 
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C.11.2. Executing Synchronization Preservation after Synchonization Acquisition 

Recovering BIUs synchronize to an existing clique by executing the Synchronization Capture 
protocol and synchronizing with respect to process P4C. After synchronizing, the recovering 
BIUs behave synchronously just like the existing BIU members of the clique. In the first 
execution of the Synchronization Preservation protocol, the recovering BIUs synchronize with 
respect to process P2. There are two important differences between the existing trustworthy BIU 
members of a clique and good recovering BIUs during this execution of the Synchronization 
Preservation protocol. 

The first difference is that good recovering BIUs do not transmit synchronization messages. 
Even if they transmitted messages, the existing members of the clique would not include those 
messages in the computation of the protocol. Therefore, for the existing trustworthy clique 
members, the result of the synchronization protocol does not depend on the performance of 
recovering BIUs. 

The second (and more important) difference is that the good recovering BIUs are not 
necessarily synchronized to the existing trustworthy BIU clique members as tightly as the 
existing trustworthy BIU clique members are synchronized to each other. This difference is 
significant in the execution of process P2 and in the definitions of the expected reception delay 
Rsp,po-p 2 and the worst-case local-time difference between the actual time of reception and the 
expect time of reception A SP , P 2 ,Rcvlmax- As presented previously, the definition of these parameters 
uses the relative local-time skew of the transmitting BIUs in process PO (i.e., Jt P0 ). Because good 
recovering BIUs are not included in the definition of 7t P0 , their effective reception delay and its 
the worst-case error can be different than for the trustworthy BIU clique members. To correct 
this problem, the relative local-time skew used to compute Rsp,po-p 2 and A SP , P 2 ,Rcvlmax must include 
the good recovering BIUs. 

7tsp.polp2.Rcv denotes the value of 7t P0 used to compute Rsp,po-p 2 and A SP P2 ,Rcvlmax for process P2 
of the Synchronization Preservation protocol. 

^SP,Polp 2 .RCV = 7tp2+P4C,H + [(1 + Po) " 1/(1 + Po)]P (C.216) 

7t S p,po denotes the value of 7t P0 for the Synchronization Preservation protocol. Except for the case 
above, 7t S p,po is: 


Ttsp.po - 7tp2,H + [(1 + Po) - 1/(1 + Po)]P 


(C.217) 


C.11.3. Time service accuracy for the Synchronization Preservation protocol 

The PEs receive periodic time updates from the BIUs in the form of INIT messages. These 
messages are triggered by the output of the Accept(INIT) functions in process P2 of the 
Synchronization Preservation protocol. The accuracy of the time service is defined here as the 
maximum error in the expect period between Accept(INIT) outputs in consecutive executions of 
the Synchronization Preservation protocol. 

Consider two consecutive executions of the Synchronization Preservation protocol, denoted by 
SP1 and SP2. 71 p2 ,a denotes the bound on the real-time relative skew of the Accept outputs in 


152 



process P2 at the trustworthy BIU nodes. 7t P2A applies to SP1 and SP2. Let t P2 ,A,ilspi and t P2 , A ,iJspi 
denote the bounds on the earliest and latest real times, respectively, at which the trustworthy BIU 
nodes synchronizing with respect to process P2 of SP1 assert the output of their Accept(INIT) 
functions. Thus: 

ttp 2 ,A = tp 2 ,A,hlsPl - tp 2 ,A,llsPl (C.218) 

Let t P2 A ,ilsp 2 and t P2 ,A,hlsP 2 denote the bounds on the earliest and latest real times, respectively, 
at which the Accept outputs are asserted in process P2 at the trustworthy BIUs for SP2. The 
relations between t P2 A ,hlspi and t P2 ,A,ilspu and t P2 ,A,hlsp2 - tp 2 ,A,ilsP2 are constrained by the drift rate of 
the local-time clocks and the validity interval for the Accept outputs in process P2 of SP2. 

tp 2 ,A,ilsP 2 — tp 2 ,A,ilspi + 2r PP j + (H p2 + T SP + B P0 + A P1 + B P1 + A P2 )/(1 + p 0 ) (C.219) 

tp2,A,hlsp2 — tp2.A.hlspi + 2r PPh + (1 + po)(H P2 + T SP + B P0 + A P1 + B P1 + A p2 ) (C.220) 

Psvclmin and Psvdmax denote the minimum and maximum intervals, respectively, between time 
updates for the time -reference service, measured in units of nominal clock ticks. 

Psvclmin — tp 2 ,A,llsP 2 ' t P 2 ,A,hlsPl 

= 2r PPj i + (H p2 + Tsp + B P o + A P i + B P i + A P2 )/(1 + po) - 71 p2 ,a (C.221) 

Psvdmax = tp2,A,hlsP2 “ tp2,A,llsPl 

= 2r PP2l + (1 + po)(H P2 + T$ P + B P() + A P1 + B P1 + A P2 ) + ti p2 ,a (C.222) 

P svc denotes the expected period between time updates for the time-reference service, measured 
in units of nominal clock ticks. 

Psvc = (Psvclmin + Psvdmax)/2 (C.223) 

Let a denote the accuracy of Psvc- 

tt — (Psvdmax " Psvdm,nj/2 

= tt P 2,A + e PP + (1/2)[(1 + po) - 1/(1 + po)](H P2 + Ts P + B P o + A P1 + B P i + A P2 ) (C.224) 

Substituting for 7t P2jA : 

a = 3e PP + [(1 + po) - 1/(1 + p 0 )][(3/2)(A P1 + B P1 + A P2 ) + (1/2)(H P2 + T SP + B P0 )] (C.225) 
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Appendix D. Analysis of the Schedule Update protocol 


The puipose of the Schedule Update mode is to determine the number of messages to be broadcast 
during the PE Communication mode. The PEs are expected to have agreement on their desired schedule 
before the start of the Schedule Update mode. In this mode, the PEs download to the ROBUS their 
agreed-upon schedule, in effect, to program the bus. For a system with N BIUs, each PE sends N 
consecutive messages to its attached BIU with the position in the sequence corresponding to the 
identification number of the PE to be scheduled and the content of the message specifying the desired 
number of scheduled messages for that PE. For each of the N PEs to be scheduled, the ROBUS applies 
the Schedule Update protocol to ensure agreement by the PEs, BIUs, and RMUs on the value received by 
the bus. After the desired schedule is processed, the ROBUS assesses the received entries and determines 
whether they form a valid schedule. The result of this assessment is then forwarded to the PEs. 

The main goal of the Schedule Update service is to allow properly working PEs to communicate as 
desired. The most important objective of the Schedule Update protocol is to ensure agreement on each 
schedule entry even in the presence of faulty PEs. Disagreement on the resulting schedule can result in 
the disintegration of a ROBUS clique during the execution of the schedule in the PE Communication 
mode. Figure D.l illustrates the message flow graph for the Schedule Update protocol. The processes 
from PO to P2 form the agreement generation phase, and from P2 to P4 form the agreement propagation 
phase. Section 5 of this document presents the detailed description of the protocol. The protocol is time- 
triggered and uses synchronous communication. The protocol combines message processing and 
diagnostics to ensure agreement even if the number of faulty PEs outnumbers the number of properly 
working PEs. 


PEs 


BIUs 


RMUs 







Stage 1 

1 

Stage 2 



Stage 3 

Stage 4 



^ Agreement 

Agreement ^ 


generation 


propagation 


Figure D.l: Message flow graph for the Schedule Update protocol 

In process PO, the BIUs serve as relays for the messages from the PEs. Thus, the values received in 
process PI and the voting results are dependent on the status of the PEs and the BIUs. 
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D.1.1. PE classification 


Similarly to the BIUs and RMUs, the performance of the PEs depends on whether they are operating 
according to their specification and their state variables are holding correct values. The PEs are expected 
to correctly and consistently perform the distributed SPIDER OS functions, including SPIDER-level 
communication and diagnosis, among others. A group of coordinated PEs that can be relied upon to 
properly perform these functions is referred to as a PE clique. For this version of ROBUS, it is assumed 
that there is at most one active PE clique at any particular time. 

For analysis, the individual PEs are classified according to the following criteria. 

• Goodness: A PE is good if it behaves according to its specification. Otherwise, the PE is bad or 

faulty. 

• Trustworthiness: A PE is trustworthy if it is suitable to properly perform the SPIDER OS 
functions. Otherwise, the PE is untrustworthy. A trustworthy PE must be good, and its state must 
be correct and in agreement with the state of other PE-clique members, if there are any. 

A PE clique consists of one or more trustworthy PEs. 


D.1.2. PE-BIU pair classification 

Because the PEs communicate with the bus through their assigned BIUs, the inputs actually processed 
by the bus depend on the status of the PEs and the BIUs. A PE-BIU pair consists of a PE and its 
corresponding BIU. Each pair is handled as a unit for the analysis of the input to the Schedule Update 
protocol. 

Each PE-BIU pair is classified according to the following categories. 

• Trustworthy: A PE-BIU pair in which the PE and the BIU are individually trustworthy. 

• Benign: An untrustworthy PE-BIU pair in which the PE-BIU pair either broadcasts valid values or is 
safely removed from eligibility in voting operations. 

• Symmetric: An untrustworthy PE-BIU pair in which the BIU is symmetric. 

• Asymmetric: An untrustworthy PE-BIU pair in which the BIU is asymmetric. 


D.1.3. Agreement generation phase 

The voters for process PI are the PE-BIU pairs. For proper operation of the protocol, it is required 
that the trustworthy BIUs in process PO broadcast PE_ERROR only if their attached PEs are 
untrustworthy. Thus, a PE_ERROR message is an explicit indication that the PE-BIU pair is 
untrustworthy. The definition of the protocol and the properties of the diagnostic system ensure that the 
set of eligible voters in process PI includes all of the trustworthy PE-BIU pairs. 

In what follows, we consider the k-th execution of the Schedule Update protocol, which corresponds 
to the processing of the desired number of messages for PE k in the schedule computed by the PE clique. 
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Let v PE k denote the desired number of messages for PE k. All trustworthy PEs in the PE clique agree on 
this value. 


Let EVpi j and EV P2 j denote the set of eligible voters for process PI at RMU i and process P2 at BIU j, 
respectively. Twy_EV Pki , Sym_EV Pl i , and Asym_EV Pki denote the set of trustworthy, symmetric, and 
asymmetric eligible voters for process PI at RMU i, respectively. Twy_EV P2i j, Sym_EV P2 j, and 
Asym_EV P2 j denote the set of trustworthy, symmetric, and asymmetric eligible RMU voters for process 
P2 at BIU j, respectively. 

It is assumed that the set of eligible voters for process P2 at the trustworthy nodes contains more 
trustworthy sources than untrustworthy ones. That is: 

ITwy_EV P2 jl > ISym_EV P2 jl + IAsym_EV P2 jl for process P2 at each trustworthy BIU j. 

Following the general protocol theory presented in Appendix A, it is also assumed that in process PI or 
process P2 the eligible voters at the trustworthy nodes do not include asymmetric sources. That is: 

IAsym_EVpi,il = 0 for process PI at each trustworthy RMU i, or 
IAsym_EV P2 jl = 0 for process P2 at each trustworthy BIU j. 

The following properties hold for the agreement generation phase. 

Agreement propagation: If ITwy_EV Plji l > ISym_EV FLl l + IAsym_EV FLl l for process PI at each 
trustworthy RMU i, then the results of the voting operations in process P2 at the trustworthy BIU nodes is 
equal to v PE k . 


Proof: For stage 1, the trustworthy PE-BIU pairs exactly agree on their desired number of messages, 
and they form a majority of eligible voters at each trustworthy RMU. Therefore, all of the word vote 
results for process PI at the trustworthy RMUs will output the value received from the trustworthy PE- 
BIU pairs, namely v PEk . For stage 2, the trustworthy RMUs exactly agree on the value v PEk , and they 
form a majority of eligible voters at each trustworthy BIU. Therefore, all of the word vote results for 
process P2 at the trustworthy BIUs will output the value v PEk . 

Agreement generation: The voting results for process P2 at the trustworthy BIU nodes exactly agree. 

Proof: Two cases must be considered. 

Case 1: IAsym_EV Pk jl = 0: If there are no asymmetric eligible voters in process PI, all of the 
trustworthy RMUs agree on their set of eligible voters and the corresponding voting inputs. Therefore, 
the voting results at the trustworthy RMUs exactly agree. The assumption that the trustworthy RMUs 
form a majority of eligible voters for process P2 at the trustworthy BIUs ensures that the agreement in 
process PI propagates to process P2. 

Case 2: IAsym_EV P2i l = 0: If there are no asymmetric eligible voters in process P2, all of the 
trustworthy BIUs agree on their set of eligible voters and the corresponding voting inputs. Therefore, the 
voting results exactly agree. 

The protocol theory presented in Appendix A requires that, in general, the trustworthy set of eligible 
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voters for process PI at the trustworthy nodes contains more trustworthy BIUs than untrustworthy ones. 
Without the ineligibility of PE_ERROR inputs in process PI, that would indeed be the case. However, 
since the voters are PE-BIU pairs, and it is known that a PE_ERROR input does not correspond to a valid 
desired number of messages for PE k, it is not always true that ITwy_EV Pk jl > ISym_EV Pk jl + 
IAsym_EV P i il at each trustworthy RMU i. Nevertheless, the protocol is able to ensure agreement in 
process P2, as shown here. Note that a PE-BIU pair can be asymmetric only if the BIU is asymmetric. A 
PE cannot be asymmetric because it only communicates with its assigned BIU. Therefore, the 
assumption in the general protocol theory that there are no simultaneous BIU and RMU asymmetric 
eligible voters in processes PI and P2 at the trustworthy nodes is the same assumption that ensures 
agreement generation for the Schedule Update protocol. The protocol ensures agreement in process P2 
irrespective of the values submitted by the PEs. It is not even required to have a group of PEs that agree 
on their values. Only the status of the BIUs and RMUs are relevant in the generation of agreement. 
However, if it is not true that the trustworthy PE-BIU pairs form a majority of eligible voters in process 
PI at each trustworthy RMU, it is possible that the result in process P2 is not the value submitted by any 
trustworthy PE. 

In addition, note that the meaning a PE_ERROR message is different in stage 1 and stage 2. In stage 
1, a received PE_ERROR message means that the source PE-BIU pair is untrustworthy. In stage 2, a 
received PE_ERROR message means that the source RMU determined that there is no agreement among 
the eligible PE-BIU pairs on the desired number of messages for PE k. A vote result of PE_ERROR in 
process P2 means that a majority of the eligible RMUs determined that there is no agreement among the 
PE-BIU pairs, or that the eligible RMUs do not agree on the desired number of messages for PE k. Both 
conditions are invalid for a schedule update. 

Finally, note that disagreement with the result of the vote is not used as an error check in processes PI 
or P2. The diagnostic system is required to generate accusations only if it is known with certainty that the 
accused is untrustworthy. In process PI, it is not possible to determine if a disagreement is caused by the 
PE or the BIU. In process P2, it is not possible to determine if a disagreement is caused by an 
untrustworthy RMU or by asymmetric PE-BIU pairs. 


D.1.4. Agreement propagation phase 

It is assumed that the set of eligible voters for processes P3 and P4 at the trustworthy nodes contains 
more trustworthy sources than untrustworthy ones. That is: 

ITwy_EV P3jk l > ISym_EV P3jk l + IAsym_EV P3?k l for process P3 at each trustworthy RMU k, and 

ITwy_EV P44 l > ISym_EV P4i il + IAsym_EV P44 l for process P4 at each trustworthy BIU 1. 

The following property holds for the agreement generation phase. 

Agreement propagation: The results of the voting operations for process P3 at the trustworthy RMUs 
and for process P4 at the trustworthy BIUs are equal to the result for process P2 at the trustworthy BIUs. 

Proof: For process P3, exact agreement propagation follows from the fact that the trustworthy BIUs 
agree on the result for process P2 and they form a majority among the eligible voters in process P3. The 
conditions are similar for the propagation of agreement from process P3 to process P4. 
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D.1.5. Schedule assessment 


The received schedule is the list of results for the N executions of the Schedule Update protocol. The 
ROBUS examines the received schedule to determine its validity. The loaded schedule is the schedule 
that is accepted by the ROBUS for use in the PE Communication mode. Two schedule validity rules are 
defined for this version of the ROBUS: (1) none of the schedule entries is equal to PE_ERROR, and (2) 
the sum of all the schedule entries is less than or equal to the total number of PE messages that can be 
processed in the PE Communication mode. The received schedule becomes the loaded schedule if it 
complies with both of these rules. Otherwise, the default schedule is loaded. 

It is possible that entries in the received schedule are not equal to the corresponding values submitted 
by trustworthy PEs. The schedule validity rules defined for this version of the ROBUS offer only some 
protection against the loading of an undesired schedule. The rules can be augmented by appending some 
sort of check word to the desired schedule list computed by the PEs and comparing the Schedule Update 
protocol result for the checksum entry against a check word computed by the ROBUS for the received 
schedule. Increasing the error coverage of the schedule validity rules reduces the likelihood that the 
loaded schedule does not meet the communication requirements of the PEs. 
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Appendix E. Analysis of the PE Broadcast and Accusation Exchange 
protocols 

Two different protocols are executed in the PE Communication mode. The PE Broadcast protocol is 
an agreement protocol with embedded diagnostic processing. This protocol is used in combination with a 
routing function to ensure that the PE messages are broadcast according to the communication schedule. 
The Accusation Exchange protocol is intended to enhance the diagnostic capabilities of the ROBUS by 
allowing fine time granularity for reconfigurations while ensuring that the suspicion-based accusations 
comply with the required properties of the diagnostic system. 


E.1.1. Bus access pattern 

The BIUs broadcast PE messages on the BIU-to-RMU links according to the communication schedule. 
The access pattern is a time-indexed, as-soon-as-possible round-robin in which the first message is sent at 
a predetermined local time and succeeding messages are sent at regular time intervals. The actual data 
introduction interval (DII) for the transmissions (denoted by A stream ) must be larger than the minimum Dll 
of the Communication Module (denoted by A Co min) and the minimum DII of the Computation Module 
(denoted by A Co mp)- 

The RMUs receive and route the messages from the BIUs according to the communication schedule. 
The transmission DII for the RMUs is the same as for the BIUs. The route function ensures that only the 
scheduled BIU is allowed to access the RMU-to-BIU links. At the receiving end, the BIUs receive and 
vote on the messages from the RMUs, thus ensuring that failed RMUs will not corrupt the results. Note 
that, in effect, the output of the RMUs is a stream composed of the individual streams from the scheduled 
BIU sources. 

Most of the time in a diagnostic cycle is available for the broadcast of PE messages. The transmission 
DII and the processing latency of the PE Broadcast protocol are the main performance determinants of the 
ROBUS. A reduction in A stream and the processing latency increases the total throughput capability of the 
bus. 


E.1.2. PE Broadcast protocol 

The PE Broadcast protocol is an interactive consistency protocol based on the generic theory presented 
in Appendix A and augmented with diagnostic processing capabilities to safely handle transmissions by 
untrustworthy trusted sources and faulty distrusted sources. The protocol also allows trustworthy nodes 
to observe and diagnose good recovering nodes by the same means as for trusted nodes. No assumption 
is made about a relation between the content of the communication schedule and the health and diagnostic 
status of BIUs. 

Figure E. 1 illustrates the message flow graph for the protocol. Section 5 of this document presents a 
detailed description of the protocol. The main purpose of the protocol is to perform a broadcast function 
in which a message from the source PE is delivered to all the PEs connected to trustworthy BIUs. From 
the perspective of the ROBUS, the actual content of the message is arbitrary and meaningless. Thus, 
there is no need for an agreement propagation phase. 
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Figure E.l: Message flow graph for the PE Broadcast protocol 


Since the Collective Diagnosis protocol ensures agreement among all the trustworthy nodes on the 
conviction results (see Appendix F), the result sent to the PEs by the trustworthy BIUs for a convicted 
BIU source is SOURCE_ERROR irrespective of the actual message sent by the source BIU or the voting 
results at PI or P2. Likewise, if the trustworthy BIUs agree on accusing the source BIU, the result sent to 
the PEs will also be SOURCE_ERROR irrespective of the actual voting result in process P2. 

We consider the properties of the protocol at the output of the voter in process P2. Irrespective of the 
health status of the source BIU, it is assumed that the set of eligible voters at each trustworthy BIU 
contains more trustworthy RMUs than untrustworthy ones. That is: 

ITwy_EVp 2 jl > ISym_EV P2 ,jl + IAsym_EVp 2 ,jl at each trustworthy BIU j. 

Validity of the voting result in process P2: If the source BIU is trustworthy, then the result of the 
vote in process P2 at the trustworthy BIUs is equal to the value sent by the source. 

Proof: The source BIU sends the same message to all the RMUs. Since the source is trustworthy, the 
trustworthy RMUs do not detect input errors or have accusations against it. Therefore, the source BIU is 
eligible in process PI and the result at all the trustworthy RMUs is equal to the message sent by the 
source. Since the trustworthy RMUs are a majority of the eligible voters for process P2 at the trustworthy 
BIUs, the result of the vote is equal to the value sent by the source BIU. 

Agreement on the voting result in process P2: For a given source BIU, if IAsym_EV Pl i l = 0 at each 
trustworthy RMU i or IAsym_EV P 2 jl = 0 at each trustworthy BIU j, then the voting result for process P2 at 
the trustworthy BIUs exactly agree. 

Proof: The source BIU may be asymmetric or otherwise. Two cases are considered. 
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Case 1: IAsym_EV Pl i l = 0: If the source BIU is asymmetric, then it is not be eligible in process PI at 
the trustworthy RMUs and the result is SOURCE_ERROR. In process P2 at the trustworthy BIUs, the 
messages received from the trustworthy RMUs is a majority of the eligible voters. Therefore, the result 
of the vote in process P2 is SOURCE_ERROR. 

If the source BIU is not asymmetric, then all the trustworthy RMUs agree on whether the source is 
eligible. If the source is ineligible, the result for process PI is SOURCE_ERROR. If the source is 
eligible, the trustworthy RMUs received the same message and the result for process PI is equal to the 
received message. Therefore, the trustworthy RMUs agree on the result for process PI. Since the 
trustworthy RMUs are a majority of eligible voters for process P2 at the trustworthy BIUs, the voting 
results in process P2 exactly agree. 

Case 2: IAsym_EV P2 ,jl = 0: Since the set of eligible voters for process P2 at each trustworthy BIU does 
not include asymmetric RMUs, and the diagnostic system is required to satisfy the property of agreement 
for non-asymmetric defendants, the sets of eligible voters are equal. Since the voting functions will have 
the same inputs and eligibility set, the voting results in process P2 exactly agree. 

The conditions of this agreement property are protocol assumptions for non-convicted sources. 

In process P2, the BIUs diagnose the source BIU based on the result of the vote. A vote result of 
NO_MAJORITY indicates that the source BIU has behaved asymmetrically. A vote result of 
SOURCE_ERROR indicates that at least one trustworthy RMU considers the source BIU to be 
untrustworthy. Both of these results are sufficient basis for the BIUs in process P2 to accuse the source 
BIU. Note that, independently of whether the source BIU is convicted or not, if the agreement property 
conditions are satisfied, then the trustworthy BIUs in process P2 will agree on their local diagnostic 
assessment of the source BIU. 

If the source BIU is asymmetric and not accused by some trustworthy RMUs, and IAsym_EV P2 ,jl ^ 0 at 
some trustworthy BIUs, it is possible not to have agreement on the voting results in process P2 at the 
trustworthy BIUs. However, if the source BIU is convicted, the PEs will receive the same result of 
SOURCE_ERROR. 

A disagreement with the result of the vote in process P2 indicates that an error occurred somewhere in 
the path beginning at the source BIU, passing through the disagreeing RMU, and ending at the voting 
BIU. The detection of disagreement is not sufficient evidence to determine which node is responsible for 
the error. Because the diagnostic system is required to satisfy the property of correctness, the most that 
can be done by a receiving BIU in process P2 is to levy a suspicion against the source BIU and the 
relaying RMU. 


E.1.3. Accusation Exchange protocol 

The local diagnostic system of each ROBUS node collects accusations and suspicions about nodes of 
the same kind and the opposite kind. Without the Accusation Exchange protocol, the BIUs would receive 
information about nodes of their own kind only from the execution of the PE Broadcast protocol, and the 
RMUs would not receive any information about their own kind. Furthermore, without the Accusation 
Exchange protocol, the BIUs would be able to observe other BIUs only when (and if) the communication 
schedule allows it. This constrains the total number of failures that can be tolerated by the bus over a 
time interval while still maintaining the ability to satisfy the required properties for the generation of 
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accusations based on the processing of suspicions. The Accusation Exchange protocol allows the 
processing of suspicions based on the latest available local diagnostic assessments. 

Figure E. 1 illustrates the message flow graph. Contrary to all of the other protocols, the content of the 
message broadcast in process PI is independent of the result computed in that process. The messages in 
stage 1 are the local accusations by the BIUs against the RMUs, and the messages in stage 2 are the 
accusations by RMUs against BIUs. Stages 1 and 2 of this protocol are closely related to the processing 
of SOURCE_ERROR messages in stage 2 of the PE Broadcast protocol. 



Figure E.l: Message flow graph for the PE Broadcast protocol 

For this protocol, it is assumed that the set of eligible voters for processes PI and P2 at the trustworthy 
nodes contains more trustworthy sources than untrustworthy ones. That is: 

ITwy_EV F | ,l > ISym_EV Pl l l + IAsym_EV PU l for process PI at each trustworthy RMU i, and 

ITwy_EVp 2 jl > ISym_EVp 2 ,jl + IAsym_EV P 2 ,jl for process P2 at each trustworthy BIU j. 

We consider stage 2. Similar properties hold for stage 1. 

Accusation correctness: For each BIU defendant, the result of the bit vote in process P2 at the 
trustworthy BIUs is TRUE only if the defendant is indeed untrustworthy. 

Proof: The correctness property of the accusation generation mechanisms ensures that the trustworthy 
RMUs do not accuse trustworthy defendants. Since the trustworthy RMUs form a majority of eligible 
voters for process P2 at each trustworthy BIU, the result of the bit vote for these defendants is FALSE. 

Agreement for non-asymmetric BIU defendants: For each non-asymmetric BIU defendant, the 
voting results for P2 at the trustworthy BIUs exactly agree. 

Proof: If the BIU defendant is non-asymmetric, the trustworthy RMUs agree on the value of their 
accusation variables for the defendant. Since the trustworthy RMUs are a majority among the eligible 
voters in process P2, the vote results for the defendant at the trustworthy BIUs agree. 

Since the broadcast accusations are the result of diagnosis based on all the observations up to the time 
of the transmission, agreement on the voting results for process P2 is possible even is the defendant is 
asymmetric. The following property captures this. 
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Agreement for a generic BIU defendant: If the trustworthy RMUs agree on the value of their 
accusation variables for the defendant or IAsym_EVp 2> jl = 0 at each trustworthy BIU j, then the 
trustworthy BIUs agree on the voting result for the defendant in process P2. 

Proof: Two cases are considered. If the trustworthy RMUs agree on the value of their accusation 
variables for the defendant, then the assumption that the trustworthy RMUs form a majority of eligible 
voters in process P2 ensures that the vote results for the defendant agree. 

If the eligible voters for process P2 at the trustworthy BIUs do not include asymmetric RMUs, the 
BIUs have the same sets of received messages and eligible voters. Therefore, their voting results for the 
defendant agree. 

It is assumed that the conditions of this property are satisfied for non-convicted BIUs defendants. 
These conditions are essentially the same as the agreement conditions for the PE Broadcast protocol. 

Given that the inputs to the bit voter are Boolean (i.e., the value are TRUE or FALSE), if the number 
of eligible voters is odd, there is always an exact-match majority among the inputs to the bit voter. If the 
number of eligible voters is even and there is not an exact-match majority among the voter inputs, then 
half of the inputs are TRUE and the other half are FALSE. Since it is assumed that the trustworthy 
RMUs form a majority among the eligible voters, then at least one trustworthy RMU accused the 
defendant. Therefore, a bit vote result of TRUE (i.e., accused) in this case conforms to the required 
property of accusation correctness. This is similar to the NO_MAJORITY result in stage 2 of the PE 
Broadcast protocol. 

In addition, similarly to the PE Broadcast protocol, a disagreement with the result of a bit vote in 
process P2 for a particular BIU defendant indicates that an error occurred somewhere in the path from the 
defendant, through the disagreeing RMU, to the voting BIU. This observation is sufficient evidence to 
raise suspicions against the defendant and the disagreeing RMU, but it is not enough to accuse either of 
them. If the result of the bit vote is TRUE, then it is known that the defendant is untrustworthy but it does 
not necessarily excuse the disagreeing RMU. 
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Appendix F. Analysis of the diagnostic system 


This appendix examines various aspects of the diagnostic system. The properties of the suspicion- 
based accusation-generation process are presented. The Collective Diagnosis protocol is analyzed using 
the generic protocol theory presented in Appendix A. In addition, the characteristics of the clique 
membership are examined using the known properties of the diagnostic system. 

The diagnostic system has two accusation generation mechanisms: immediate and suspicion-based. 
The immediate accusations are based on error detection for which there is only one possible culprit, 
other than the observer itself. Suspicion-based accusations are based on the detection of errors with 
multiple possible culprits and the processing of accumulated suspicions to identify untrustworthy nodes 
based on accumulated observations. The immediate-accusation mechanisms are known by design to 
comply with the required properties of correctness and agreement for non-asymmetric defendants. This 
aspect of the diagnostic system is not explored further. The suspicion-based accusation generation 
process is examined next. 


F.l. Suspicion-based accusations 

This section examines the properties of the suspicions-based accusations for a given collective 
diagnostic interval. The expression not currently convicted refers to a node that is classified as not 
convicted during the current interval. A not currently convicted node was not convicted during the 
execution of the Collective Diagnosis protocol in the immediately preceding collective diagnostic 
interval. In contrast, a currently convicted defendant is a convicted node during the current interval. 

Figure F.l illustrates the suspicions matrix. As described in Section 4 of this document, every node 
records it suspicions in a two-dimensional matrix in which the rows correspond to the nodes of the same 
kind as the observing node and the columns corresponds to the nodes of the opposite kind. 0 and Q. 
denote the number of nodes of the same kind and the opposite kind, respectively. The suspicions-matrix 
is processed by bit vote operations for each row and column of the matrix. 
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Figure F. 1 : Suspicions matrix and generated accusations 


The suspicions are recorded only during the PE Communication mode, which is the only mode in 
which the observers gather evidence about the behavior of nodes of their own kind. The BIUs record 
suspicions based on observations during the execution of the PE Broadcast and Accusation Exchange 
protocols. The RMUs record suspicions only during the Accusation Exchange protocol. A cell in the 
suspicions matrix is asserted when there is evidence that one or both of the corresponding nodes are 
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untrustworthy. 

The analysis for the processing at the RMUs is a special case of the analysis for the BIUs. We 
examine the processing at BIU nodes only. 


F.1.1. Processing suspicions against nodes of the opposite kind 

The suspicions against nodes of the opposite kind correspond to the columns of the suspicions matrix. 
These suspicions are processed by a separate bit vote operation for each column. The eligible voters are 
the trusted nodes of the same kind. Thus, every row corresponding to a distrusted node of the same kind 
is removed from consideration in the bit vote operations. The Collective Diagnosis protocol ensures 
agreement among all the trustworthy nodes on the conviction results. In addition, the properties of the PE 
Communication protocols ensure that, for each not currently convicted node of the same kind, the 
trustworthy BIUs agree on their accusations performed up to the time at which the suspicions matrix is 
processed. Therefore, the trustworthy BIUs agree on the set of eligible voters. Furthermore, it is assumed 
that the set of eligible voters includes more trustworthy than untrustworthy nodes. Let EV S K,i denote the 
set of eligible voters of the same kind at BIU i. Twy_EV SK l , Sym_EV SK l , and Asym_EV SKi denote the 
sets of trustworthy, symmetric, and asymmetric eligible voters, respectively. Thus: 

ITwy_EV SK jl > ISym_EV SK l l + IAsym_EV SKi l for each trustworthy BIU i. 

We need to show that the accusation results generated by the bit vote operations satisfy the required 
properties of correctness and agreement for non-asymmetric defendants. 

For a particular defendant of the opposite kind, if the result of the bit vote is TRUE, then, for the 
suspicions-matrix column corresponding to the defendant, at least one cell corresponding to a trustworthy 
node of the same kind is TRUE. This means that at least once during the PE Communication mode the 
value received from the given defendant disagreed with the result of a vote involving a trustworthy node 
of the same kind. The validity property for the PE Broadcast and Accusations Exchange protocols 
ensures that the result of the vote is the correct value. Therefore, the given defendant is indeed 
untrustworthy. 

If the defendant is non-asymmetric, the trustworthy BIUs agree on their observations of the defendant. 
Since the trustworthy BIUs always agree on the vote results in the PE Communication mode involving 
not currently convicted nodes of their own kind, the trustworthy BIUs agree on the suspicion entries 
corresponding to the eligible voters. Therefore, the trustworthy BIUs will agree on the bit vote results for 
the defendant. 


F.1.2. Processing suspicions against nodes of the same kind 

The suspicions against nodes of the same kind correspond to the rows of the suspicions matrix. The 
eligible voters for the bit vote operations along the rows are the trusted nodes of the opposite kind. Every 
column corresponding to a distrusted node of the opposite kind is removed from consideration in the bit 
vote operations. The properties of the diagnostic system ensure that the trustworthy BIUs agree on their 
accusations for non-asymmetric nodes of the opposite kind. Furthermore, it is assumed that the set of 
eligible voters includes more trustworthy than untrustworthy nodes. Let EV 0 K,i denote the set of eligible 
voters of the opposite kind at BIU i. Twy_EV 0 K.i, Sym_EV 0 K,i, and Asym_EV 0 K,i denote the sets of 
trustworthy, symmetric, and asymmetric eligible voters, respectively. Thus: 
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ITwy_EVoK.il > ISym_EVoK.il + IAsym_EVoK.il for each trustworthy BIU i. 

For a particular defendant of the same kind, if the result of the bit vote is TRUE, then, for the 
suspicions-matrix row corresponding to the defendant, at least one cell corresponding to a trustworthy 
node of the opposite kind is TRUE. This means that the at least once during the PE Communication 
mode the value received from a trustworthy node of the opposite kind disagreed with the result of a vote 
involving the defendant. The message transmitted by a trustworthy RMU in the PE Broadcast and 
Accusation Exchange protocols are a direct result of the messages received from the defendant and the 
local diagnostic assessment. Therefore, the node responsible for the disagreements is the defendant, 
which makes it untrustworthy. Thus, the property of correctness is satisfied. 

If the defendant is non-asymmetric, the trustworthy RMUs agree on their observations and local 
diagnosis of the defendant. Furthermore, since the trustworthy RMUs considered for the processing of 
suspicions are always part of the majority among the eligible voters in the protocols of the PE 
Communication mode, the values received from them never disagree with the result of the vote. 
Therefore, for the suspicions-matrix row corresponding to the defendant, the cells corresponding to the 
trustworthy RMUs are FALSE. Consequently, the bit vote result will be FALSE and no new accusations 
are raised against the defendant. Since the trustworthy BIUs always agree on their local diagnostic 
assessment of trustworthy RMUs, a non-asymmetric defendant of the same kind will not be accused in 
any of them as a result of processing the suspicions matrix. 

Consider a not-currently-convicted defendant of the same kind. The properties of the PE 
Communication protocols ensure that the trustworthy BIUs agree on their vote results for the given kind 
of defendant. Therefore, for a not-currently-convicted defendant, the trustworthy BIUs agree on the 
suspicions-matrix entries corresponding to non-asymmetric nodes of the opposite kind. The bit vote 
results at the trustworthy BIUs agree if the suspicions-matrix entries corresponding to the trustworthy 
nodes of the opposite kind agree or there are no asymmetric eligible voters at any of the trustworthy 
BIUs. To show this, we consider the two conditions separately. If the first condition were true, then the 
bit vote results at the trustworthy BIUs would agree because the trustworthy RMUs form a majority 
among the eligible voters. If the second condition were true, the bit vote results would agree because 
there would be agreement on the inputs to the vote and the eligible voters. 

Note that if the required conditions for the agreement properties of the PE Communication protocols 
hold for a currently convicted defendant of the same kind, then the processing of suspicions for that 
defendant has the same properties as for a not currently convicted one. 


F.2. Collective Diagnosis protocol 

The on-line diagnosis protocol was developed by Geser and Miner. The formal verification of the 
protocol is presented in [Geser 04] . 

This section examines the Collective Diagnosis protocol for BIU defendants. The analysis of the 
protocol for the diagnosis of RMU defendants is similar. Figure F.2 illustrates the message flow graph 
for the Collective Diagnosis protocol for BIU defendants. The protocol processes are from PO to P4. The 
actual protocol processes multiple defendants in parallel. To simplify the presentation, we examine the 
characteristics of the protocol for a single defendant. The “def ' bubble represents the defendant, and the 
dashed arrow represents the diagnostic evidence collected by the RMU nodes and used to generate the 
accusations against the defendant. The RMUs are direct observers of the defendant, and the BIUs are 


169 



indirect observers. Every ROBUS protocol provides an opportunity for the RMUs to collect evidence 
against the defendant. The BIUs are able to observe the defendant only in the PE Communication mode. 
If the defendant is a scheduled source, the PE Broadcast protocol enables the BIUs to observe the 
messages broadcast by the defendant and relayed by the RMUs after some processing. The Accusation 
Exchange protocol enables the BIUs to observe the defendant only in terms of the accusations submitted 
by the RMUs. 


PEs 


BIUs 


RMUs 



Stage 1 Stage 2 Stage 3 Stage 4 


Agreement 1 Agreement 

generation propagation 


Figure F.2: Message flow graph for the Collective Diagnosis protocol for BIU defendants 

It is assumed that each of the sets of eligible voters for processes PI through P4 at the trustworthy 
nodes contains more trustworthy sources than untrustworthy ones. That is: 

ITwy_EVpi il > ISym_EVpi,il + IAsym_EV P i,;l for process PI at each trustworthy BIU i, 

ITwy_EVp 2 jl > ISym_EVp 2 ,jl + IAsym_EV P 2 ,jl for process P2 at each trustworthy RMU j, 

ITwy_EVp 3 jk l > ISym_EVp 3jk l + IAsym_EV P 3 , k l for process P3 at each trustworthy BIU k, and 

ITwy_EVp 4 jl > ISym_EVp 4 ,il + IAsym_EV P4 ,il for process P4 at each trustworthy RMU 1. 

In addition, it is also assumed that in process PI or process P2 the eligible voters at the trustworthy 
nodes do not include asymmetric sources. That is: 

IAsym_EVpi,jl = 0 at each trustworthy BIU i, or 

IAsym_EVp 2 ,jl = 0 at each trustworthy RMU j. 


F.2.1. Agreement generation phase 

We begin the analysis of the agreement generation phase by considering conviction agreement for a 
not-currently-convicted defendant. The properties of the PE Communication protocols and of the 
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accusation generation mechanisms, including the processing of the suspicions matrix, ensure that the 
trustworthy BIUs have agreement on the accusations generated against the defendant. 

Agreement for not-currently-convicted defendants: If the defendant is not currently convicted, the 
trustworthy RMUs agree on the voting results for process P2. 

Proof: Two cases are considered. 

Case 1: lAsymJEVpi il = 0: Since all of the eligible voters for process PI at the trustworthy BIUs are 
non-asymmetric, they all agree on the voting inputs and the eligible voters. Therefore, they agree on the 
result of the bit vote. Since there is agreement on the bit vote result and on the local accusation against 
the defendant, the merge operation, a simple Boolean OR function, preserves that agreement. Stage 2 is 
an agreement propagation stage, and thus the trustworthy RMUs agree on the result of the bit vote for 
process P2. 

Case 2: IAsym_EV P 2 ,jl = 0: For this case, the trustworthy RMUs agree on the eligible voters and the 
inputs received from them for process P2. Therefore, they also agree on the bit vote result. 

Next, consider a currently convicted defendant. In general, there is no guarantee of agreement among 
the trustworthy BIUs on the local diagnosis of a currently convicted BIU. 

Agreement for currently convicted defendants: If the defendant is currently convicted, and the 
condition that IAsym_EV PU l = 0 at each trustworthy BIU i implies that the trustworthy BIUs agree on the 
accusations against the defendant, then the trustworthy RMUs agree on the voting results for process P2. 

Proof: Two cases are considered. 

Case 1: IAsym_EV P i,il = 0: For this case, the trustworthy BIUs agree on the eligible voters and the 
inputs received from them for process PI. Thus, they agree on the result of the bit vote. By the premises, 
the trustworthy BIUs agree on the accusations against the defendant. The merge operation preserves that 
agreement. Stage 2 is an agreement propagation stage, and thus the trustworthy RMUs agree on the result 
of the bit vote for process P2. 

Case 2: IAsym_EV P 2 ,jl = 0: For this case, the trustworthy RMUs agree on the eligible voters and the 
inputs received from them for process P2. Therefore, they also agree on the bit vote result. 

Conviction correctness holds irrespective of whether the defendant is currently convicted or not. 

Conviction correctness: The result of the bit vote in process P2 is TRUE only if the defendant is 
untrustworthy. 

Proof: Since the trustworthy BIUs are a majority among the eligible voters in process P2, a bit vote 
result of TRUE implies that a value of TRUE was received from at least one trustworthy BIU. The result 
of the merge operation in process PI at a trustworthy BIU is TRUE if the local accusation is TRUE or the 
result of the bit vote is TRUE. The properties of the accusation generation mechanisms ensure that the 
local accusations are correct. If the result of the bit vote is TRUE, then at least one value of TRUE was 
received from process P0 at a trustworthy RMU. The properties of the accusation generation mechanisms 
ensure that the accusations at the trustworthy RMUs are correct. Therefore, the defendant is convicted 
only if it has been accused by a trustworthy observer. 
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For a BIU defendant, the Collective Diagnosis protocol guarantees convictions for the following cases. 


• The defendant is benign or symmetric untrustworthy and accused by BIUs or RMUs. 

• The defendant is asymmetric untrustworthy and accused by: (1) a subset of trustworthy RMUs that 
is at least half of the eligible voters for process PI at each trustworthy BIU, or (2) a subset of 
trustworthy BIUs that is at least half of the eligible voters for process P2 at each trustworthy 
RMU. 


F.2.2. Agreement propagation phase 

The agreement generated in the first phase is propagated in the second one irrespective of the status of 
the defendant. 

Agreement propagation: The results of the voting operations for process P3 at the trustworthy BIUs 
and for process P4 at the trustworthy RMUs are equal to the results for process P2 at the trustworthy 
RMUs. 

Proof: For process P3, exact agreement propagation follows from the fact that the trustworthy RMUs 
agree on the result for process P2 and they form a majority among the eligible voters in process P3. The 
conditions are similar for the propagation of agreement from process P3 to process P4. 


F.3. Clique membership 

For a given node, the set of trusted nodes of the same kind and the set of trusted nodes of the opposite 
kind constitute the node’s view of the clique membership. The required properties of correctness and 
agreement for non-asymmetric defendants establish basic constraints on the trusted sets. Correctness of 
diagnosis ensures that the trustworthy nodes trust one another. Thus, at each trustworthy node, the trusted 
set includes all of the trustworthy nodes of the same kind and the opposite kind. This is a basic 
requirement for maintaining the unity of the clique. The property of agreement for non-asymmetric 
defendants ensures that the trustworthy nodes of a particular kind agree on their trust assessment for non- 
asymmetric defendants. This property is needed to ensure that the protocols are able to generate 
agreement. 

When trustworthy nodes of a particular kind have agreement on the clique membership, their 
agreement has the following characteristics. 

• They agree on trusting all trustworthy nodes, irrespective of whether they are of the same or 
opposite kind. 

• For each non-asymmetric node, they agree on either trusting it or not, irrespective of whether the 
node is of the same or opposite kind. 

• For each not-currently-convicted node of the same kind, they agree on either trusting it or not. 

• There is no certainty of agreement for asymmetric nodes that are of the opposite kind or currently 
convicted. 


172 



When trustworthy nodes of opposing kinds have agreement on clique membership, they agree as 
follows. 

• They agree on trusting all trustworthy nodes, irrespective of whether they are of the same or 
opposite kind. 

• There is no certainty of agreement for untrustworthy nodes of any kind or status. 
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Appendix G. Analysis of startup and restart 


This appendix examines the startup and restart capabilities. The mode transition graph for the ROBUS 
nodes is presented in Section 2. 

Recovery is the process of reaching the Clique Preservation mode from a disabled or failed state. 
Startup is a recovery triggered by a power-on enable. Restart is triggered by the detection of a local 
failure or a bus failure. The recovery process involves the following sequence of steps: reset, self-test, 
search for an active clique, and attempt to join or form a clique. Four basic recovery cases are defined 
based on the digger and whether there is a clique present during recovery. Table G.l illustrates the 
relation among the recovery cases. The recovery trigger can be a power-on enable or the detection of a 
local failure or a bus failure. A clique-join case is a recovery with a clique already active on the bus. 
There is no active clique for a clique-initialization case. 


Trigger 


Active 

Clique 

Present 



Power-On Enable 

Failure Detection 

Yes 

Join 

Rejoin 

No 

Initialization 

Re-initialization 


Table G.l: Recovery cases 


G.l. Recovery limitations 

The trustworthy nodes enter the Clique Preservation mode synchronized and with agreement on the 
diagnostic state. Highly deterministic time-triggered behavior in this mode enables it to have 
substantially robust fault-tolerance. The Clique Join mode shares similar advantages. Other major modes 
are less robust. 

The most important characteristic for the Self-Test mode is the coverage of the test. Ideally, the 
coverage is sufficiently high to detect most faults, especially permanent faults, and the nodes are 
implemented in such a way that a node will not exit this mode unless it is fault-free. Such behavior 
essentially corresponds to a fail-stop on recovery, which increases the chances that other good recovering 
nodes will successfully reach the Clique Preservation mode. 

The Clique Detection mode consists of attempting to acquire the state of a clique while simultaneously 
monitoring for the absence of one. The effectiveness of this mode is limited by the ability of recovering 
nodes to correctly diagnose observed nodes without referencing its local state or comparing messages 
received from different nodes. The error detection and diagnosis mechanisms are less effective without 
these references. In addition, accusations asserted during the Clique Detection mode could last two or 
three times the duration of accusations made in the Clique Preservation mode. This can result in 
scenarios in which trustworthy nodes newly arrived to a clique are accused. Thus, there is a chance of 
false positive and false negative diagnoses in the Clique Detection mode. If a clique is present, successful 
diagnosis and state acquisition requires that a sufficient number of untrustworthy nodes are removed from 
the trusted set such that a majority of trusted nodes are trustworthy. Successful detection that no clique is 
present during recovery requires that the recovering node correctly diagnose that there are no trustworthy 
nodes of one kind or another. The Clique Detection mode as presented in Section 7 has the special 
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vulnerability that, if the set of eligible voters does not have more trustworthy nodes than untrustworthy 
ones, the Synchronization Capture protocol may not correctly synchronize the local time and may not 
even finish at all. Such a violation of the protocol assumptions can occur not only due to the inability of 
Local Diagnosis Acquisition and Frame Synchronization to identify a sufficient number of untrustworthy 
nodes, but also if there is a decrease in the number of trustworthy clique members after Synchronization 
Capture is started. 

It is a requirement that a group of recovering nodes must have a relative local-time skew that remains 
within a known bound during Initial Diagnosis and Initial Synchronization. Compliance with the 
bounded-skew requirement enables the nodes to execute Initial Diagnosis with synchronous 
communication even if the skew bound is large. The specification of the time interval for reception, the 
number of expected messages, and the message content for Initial Diagnosis enhances the effectiveness of 
its error detection and diagnosis. The success of Initial Synchronization is dependent on the validity of 
the actual local-time skew and the eligible voter sets. Similarly to the Synchronization Capture protocol 
in the Clique Detection mode, the Initial Synchronization protocol presented in Section 9 has the 
vulnerability that a violation of the protocol assumptions can result in an incorrectly synchronized local 
time or a state of indefinite suspended activity. 

The limitations of the diagnostic system in the Clique Detection and Clique Initialization modes, and 
the requirement of having a known bound on the relative local-time skew in the Clique Initialization 
mode constrain the effectiveness of the ROBUS recovery scheme. Providing a comprehensive design that 
ensures a successful recovery for all possible scenarios is beyond the scope of this ROBUS version. The 
ROBUS recovery capability is intended to handle scenarios of independent and non-overlapping recovery 
cases. For example, proper handling of scenarios in which a set of good nodes is initializing a new clique 
while another good node is simultaneously trying to join a clique are not considered essential for the 
design. Although there are circumstances for which such scenarios eventually result in a clique that 
includes all of the recovering nodes, in general, the recovery cases are only considered to occur 
separately. The robust diagnostic capabilities of the ROBUS nodes should allow them to detect a failed 
recovery and initiate a retry, thus increasing the chances of successful recovery. Analysis of the success 
rate for all possible recovery scenarios is beyond the scope of this appendix. 

The response of the ROBUS nodes for the previously mentioned failure modes of the Synchronization 
Capture and Initial Synchronization protocols can be improved by adding a pair of error checks. The first 
one is a timeout check to ensure that the nodes eventually exit out of the synchronization protocols even if 
the assumptions are not satisfied. A timeout check resource can also be used for a check on the 
resynchronization period in the Clique Preservation and Clique Join modes. The second additional check 
for Synchronization Capture and Initial Synchronization is a comparison between the number of eligible 
voters at the beginning of the protocol and the number of eligible voters at the end. This requires the 
execution of an error detection and diagnosis activity in parallel with the synchronization protocols. For 
the Synchronization Capture protocol, this diagnostic activity can be realized by continuing the Local 
Diagnosis Acquisition checks during Synchronization Acquisition. For the Initial Synchronization 
protocol, the diagnostic activity may consist of custom checks based on the expected message pattern for 
each opposite kind source during this protocol. For Synchronization Acquisition and Initial 
Synchronization, it is assumed that the number of trustworthy eligible voters is greater than the number of 
untrustworthy ones, and that the trustworthy voters will remain so during the execution of the protocols. 
If fewer than a majority of the initially eligible voters satisfy the eligibility conditions at the end of the 
protocol, then the protocol assumption have been violated and a failure detection can be asserted. 

The triggering of recovery is modeled as discrete distributed events that involve a particular set of 
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nodes where recovery is triggered during a given finite time interval. Each instance of these discrete 
distributed events is referred to as a recovery-trigger event and is characterized by a precision and a set 
of recovering nodes. The relative skew of a recovery trigger between two recovering nodes is the real 
time elapsed between the triggering of the recovery at the two nodes. The precision of a recovery-trigger 
event is the largest relative skew of the recovery trigger for the set of recovering nodes. The recovery 
process for a particular recovery-trigger event is the activity on the bus from the start of the event until 
the completion of the recovery. 

The precision of a recovery-trigger event is an important determinant of the relative local-time skew 
during Initial Diagnosis and Initial Synchronization in the Clique Initialization mode. Other relevant 
factors are the duration of the Self-Test and Clique Detection major modes, and the duration of the Initial 
Diagnosis and Initial Synchronization minor modes. Once a recovery is triggered, a certain amount of 
time is required for the bus to return to normal steady-state operation. The time between recovery-trigger 
events should be large enough so the recovery process of one event is complete before the next event 
arrives. For the design and analysis of the ROB US recovery scheme, it is assumed that there is a known 
bound for the precision of the recovery-trigger events, and that the events occur sufficiently apart so that 
the recovery processes are effectively independent and non-overlapping. 


G.2. Clique initialization 

We examine the timing aspects of the clique-initialization recovery cases, which include initialization 
with a power-on enable recovery trigger and re-initialization triggered by the detection of a failure. For 
these cases, a clique is not present on the bus during recovery, and thus the major mode path through the 
Clique Initialization mode is followed to reach the Clique Preservation mode. 


G.2.1. Power-one enable 

It is assumed that the nodes enter the Self-Test mode immediately after power-on enable. 5 PO e denotes 
the actual duration of the time interval within which the nodes are enabled, measured in units of seconds. 
SpoElmax denotes the upper bound for 5 PO e- ^poe denotes the upper bound on the relative time skew at 
power-on enable, measured in nominal clock ticks. To denotes the nominal duration of a clock tick 
measured in seconds. 5 PO e and 7t P0E are related as follows: 

ttpOE = S P OElmax/'t0 (G.l) 


G.2.2. Local failure or bus failure 

5 F cp denotes the actual duration of a fault-causing phenomenon measured in units of seconds. We 
assume that the duration of the fault-causing phenomenon as experienced by individual nodes can be 
effectively 0. (Note that 5 F cp = 0 means that the phenomenon has a negligibly small duration, not that the 
phenomenon has no effect.) So: 

5ecpU„ = 0 (G.2) 

Spcplmax depends on the characteristics of the fault-causing phenomenon for which the design is targeted. 

A fd denotes the actual duration of the failure-detection delay measured in local clock ticks. We 
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assume that it is possible for a node to detect a failure condition immediately. A FD I IT111X is implementation- 
dependent. 8 fd denotes the actual duration of the failure -detection delay measured in nominal clock ticks. 


= 0 

(G.3) 

= ( 1 + po)A F Dlmax 

(G.4) 


Let t FC p,o denote the time at which the fault-causing phenomenon begins. Let t res tart,i and t re start,h denote 
the earliest and latest times, respectively, at which nodes affected by the fault-causing phenomenon enter 
the Self-Test mode. 

trestart,l — IfCP.O + SfcpI min "t" S|-[)l mm 


— l-FCP.O 


(G.5) 


And: 

f restart, h = tFCP.O + Spcplmax/'to + (1 + Po)ApDlmax (G.6) 

Let Tlrestart denote the upper bound on the relative time skew when entering the Self-Test mode for 
restart, measured in nominal clock ticks. 

^restart — trestart,h “ ties tart, I 

= Spcplmax/to + (1 + Po)AfdI max (G.7) 


G.2.3. Self-Test mode 

We want to determine the duration of the Self-Test mode and the bound on the relative skew upon 
exiting this mode. 


G.2.3. 1. Duration of the Self-Test mode 

The Local Upset Abatement Delay (LUAD) for a transient-fault scenario is defined as the delay from 
the time the fault-causing phenomenon reaches a node until the node has regained control of its local 
operation. Local regaining of control is assumed to occur after the node has detected the failure 
condition, at which time the node disables its broadcast outputs and transitions to the Self-Test mode. In 
the Self-Test mode, the node first performs a full local reset and then begins the execution of the self-test 
procedure. This local reset activity should cover the Communication and Computation modules. The 
duration of the reset is implementation-dependent. Let 8 L uad denote the actual duration of the Local 
Upset Abatement Delay, measured in units of nominal clock ticks. 

8uj Aljlmax — lrestart.h " IfCP.O 

= §FCplmax/'t0 + (1 + Po)AfdI max (G.8) 

The Observed Upset Abatement Delay (OUAD) for a transient-fault scenario is defined as the delay 
from the time the fault-causing phenomenon begins until the affected nodes can be consistently 
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recognized by all direct observers as being untrustworthy. Note that this delay is defined with respect to 
the effects perceived by the observing nodes. Thus, the reception delay must be taken into consideration. 
Let 8ouad denote the actual duration of the Observed Upset Abatement Delay, measured in units of 
nominal clock ticks. 

SoUADlmax = SujADlmax + IpP.h 

= §FCplmax/'t0 + (1 + PtOApolmax + r PP,h (G.9) 

A S tm denotes the duration of the Self-Test mode for a ROBUS node, measured in units of local clock 
ticks. A stm is assumed constant. The duration of the Self-Test mode must satisfy the timing requirements 
for the expected transient-fault scenarios. To increase the probability that a restarting node does not trust 
an affected node, we require that the restarting nodes exit the Self-Test mode only after the latest time at 
which affected nodes can be recognized as untrustworthy. So: 


trestart,l + AstmAT + Po) — t restart,!] + fpph 
Astm/( 1 + Po) ^ restart + FpP.h 

AstmAA + Po) - Sou AD I max (G.IO) 

In terms of local clock ticks, the above inequality corresponds to the following constraint: 

Astm ^ l~(l + po)SouADlmaxl (G. 11) 

G.2.3.2. Bound on the relative local-time skew at the end of the Self -Test mode 

7Istm denotes the upper bound on the relative time skew at the end of the Self-Test mode, measured in 
nominal clock ticks. So: 


ttsTM — max(7tpoE> ^restart) + [(1 + po) - 1/(1 + po)] A stm 


(G.12) 


G.2.4. Clique Detection mode 

The Clique Detection mode is composed of three main operations: Local Diagnosis Acquisition, 
Synchronization Acquisition, and Collective Diagnosis Acquisition. 


G.2.4.1. Local Diagnosis Acquisition 

Local Diagnosis Acquisition is composed of two consecutive observation intervals, each with a 
duration at least as large as a resynchronization interval. A LDAbegin denotes the delay from the time a node 
exits the Self-Test mode until the beginning of the first observation interval, measured in local-clock 
ticks. The value of A LDAbegin is determined by the implementation and assumed constant. 
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G.2.4.1.1. Bound on the duration of an observation phase 

It is assumed that, at the earliest, a node in Local Diagnosis Acquisition can detect the absence of a 
valid clique as soon as it enters the observation phase. A lda ,ow denotes the duration of the observation 
intervals (or “windows”), measured in local-clock ticks. The value of A lda ,ow is determined by the 
implementation and assumed constant. A lda , 0 w should be large enough to cover the duration of a 
resynchronization cycle measured in local-clock ticks. So: 

Alda.ow - P (G. 1 3) 

P is given in Appendix C. 


G.2.4.1.2. Bound on the duration of Local Diagnosis Acquisition 

Let 5 L da denote the actual duration of Local Diagnosis Acquisition measured in nominal clock ticks. 
^LDAlmin A.D A,bcgir/( ^ "^Po) (G. 14) 

^LDA^max = ( l+Po)(^LDA, begin + 2A LDA-0 w) (G.15) 

G.2.4.2. Synchronization Acquisition 

Synchronization Acquisition is composed of the Frame Synchronization and Synchronization Capture 
protocols. Synchronization Acquisition ends with the synchronization reset, at which point the local time 
is set to 0. 


G.2.4.2.1. Frame Synchronization 

It is assumed that a node can detect the absence of a valid clique at any time during Synchronization 
Acquisition. A FSjb e g in denotes the delay from the end of the second observation window during Local 
Diagnosis Acquisition to the beginning of the Frame Synchronization protocol during Synchronization 
Acquisition, measured in local clock ticks. A FS ,be g in is implementation-dependent and assumed constant. 

A FS denotes the actual duration of the execution of the Frame Synchronization protocol measured in 
local clock ticks. A FS is given in Appendix C. 

5 fs denotes the actual duration of the Frame Synchronization protocol measured in nominal clock 
ticks. 

^FS I max — (1 + po)A F sl max (G.16) 


G.2.4.3. Synchronization Capture 

We assume that the Synchronization Capture protocol begins immediately after the Frame 
Synchronization protocol is complete. 
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G. 2. 4. 3.1. Bound on the duration of the Synchronization Capture protocol 

5 S c denotes the actual duration of the execution of the Synchronization Capture protocol measured in 
nominal clock ticks. The execution of the protocol may begin shortly after the ECHO messages are 
transmitted by the clique during the execution of the Synchronization Preservation protocol. In that case, 
the end of the Synchronization Capture protocol would occur after the reset is applied during the next 
execution of the Synchronization Preservation protocol. To specify a bound for the duration of the 
Synchronization Capture protocol, we consider an interval containing two consecutive executions of the 
Synchronization Preservation protocol. 5 S pl ma x denotes the upper bound on the real-time duration of the 
execution of the Synchronization Preservation protocol. 8 S pl ma x is given in Appendix C. T S p denotes the 
scheduled local time at which the execution of the Synchronization Preservation protocol begins. 

dsclmax — Pmax "t" Ssplmax 

— (1 + Po)Tsp + 25spl max (G.17) 


G.2.4.4. Bound on the duration of Synchronization Acquisition 

Let 8 S a denote the actual duration of the execution of the Synchronization Acquisition measured in 
nominal clock ticks. 

^SAlmax — (l+po)A F s , begin ^Fslmax ^SC^max (G.18) 

A S a denotes the actual duration of Synchronization Acquisition measured in local clock ticks. We 
want to ensure that a count of A SA l m ax local ticks takes no fewer than 5 SA l m ax nominal ticks. 

AsAlmax/(l+Po) ^ SsaI max (G.19) 

We choose the minimum value that satisfies the constraint. That is: 

AsAlmax = f( l + po)§SAlmaxl (G.20) 


G.2.4.5. Bound on the duration of the Clique Detection mode 

Synchronization Acquisition ends with the synchronization reset, at which point the local time is set to 
0. From that point on, the local time should be synchronized to the clique in Preservation mode. The 
delays to begin and complete the Collective Diagnosis Acquisition protocol in the Clique Detection mode 
are the same as for the Collective Diagnosis protocol in Clique Preservation mode. A C d, begin denotes the 
time from the synchronization reset to the beginning of the Collective Diagnosis protocol, measured in 
local clock ticks. A C d denotes the time to complete the execution of the Collective Diagnosis protocol in 
local clock ticks. The transition to the Clique Join mode occurs at the beginning of execution of the 
Schedule Update protocol. Before that point, a detected failure attributable to the absence of a clique 
results in a transition to the Clique Initialization mode. A S u, begin denotes the time from the end of the 
Collective Diagnosis protocol to the beginning of the Schedule Update protocol, measured in local clock 
ticks. AcD.begin, A C i), and A S u,be g in are implementation-dependent and determined by the time -indexed 
operation schedule specifying the timing for bus activities. Section 3 presents the concept of distributed 
synchronous composition. 
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After detecting the absence of a valid clique, a node clears its state and transitions to the Clique 
Initialization mode. Acdm-cim denotes the delay to transition to the Clique Initialization mode after 
detecting the absence of a valid clique, measured in units of local clock ticks. Acdm-cim is 
implementation-dependent and assumed constant. 5cdm denotes the actual duration of the Clique 
Detection mode for a ROBUS node, measured in units of nominal clock ticks. 


ScDMlmin - SeDaUm + AcdM-CImAT + Po) (G.21) 

ScDMlmax = Sl-DAlmax + SsA^ax + [(1+Po) (AcD.begin + Acd + Asu.begin + AcdM-CIm)] (G.22) 


G.2.4.6. Bound on the relative local-time skew at the beginning of the Clique Initialization mode 

^cim. begin denotes the upper bound on the relative time skew at the beginning of the Clique 
Initialization mode, measured in nominal clock ticks. 

ftciM.BEGIN = ftSTM + (ScDMlmax " ScDMlmin) (G.23) 


G.2.5. Initial Diagnosis 

To simplify the presentation, we would like to compute a single upper bound for the relative local- 
time skew during the execution of the Initial Diagnosis and Initial Synchronization protocols. Let 7t ID+IS 
denote that bound, measured in nominal clock ticks. 

For Initial Diagnosis, the BIUs and RMUs are assumed to have the same timing characteristics. The 
analysis presented here does not refer to the kind of the node sending or receiving messages for any of the 
protocol processes. 

Aid, begin denotes the delay from the time a node enters the Clique Initialization mode until the time it 
begins the execution of the Initial Diagnosis protocol, measured in units of local clock ticks. The value of 
Aid, begin is determined by the implementation and assumed constant. 


G.2.5. 1. Communication between processes PO and PI 

The following variables are defined: T ID denotes the local time triggering the execution of the Initial 
Diagnosis protocol; T id p0 -pi,ref denotes the reference time for the communication between processes PO 
and PI; T id ,po,snd denotes the time at which process PO sends the message; T 1d ,pi,rcv,e denotes the 
expected time of reception in process PI; Sid.po denotes the Send Process delay for process PO; 
Aid,pi,rcvwnd denotes the delay from the communication reference time to the opening of the receive 
window in process PI; R PP denotes the nominal point-to-point reception delay; W ID , Deskew denotes the size 
of the deskewing window in process PI; W ID ,Deskew, P re denotes the pre -expectation window in process PI 
(i.e., the size of the section of the deskewing window before the expected time of reception); W ID ,Deskew. P ost 
denotes the post-expectation window in process PI (i.e., the size of the section of the deskewing window 
after the expected time of reception); A ID ,pp,Rcvlabs-max denotes the absolute value of the maximum error in 
the actual time of reception in process PI for a good source-receiver pair; Ci D ,pi denotes the computation 
delay in process PI (The computation delay is measured from the end of the deskewing window. C C d,pi is 
assumed constant.); A ID P1 C . END denotes the delay in process PI from the end of the computation to the end 
of the execution of the Initial Diagnosis protocol (A IDP1 C . END is assumed constant.); and A ID denotes the 
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duration of the execution of the Initial Diagnosis protocol. 


T id is the reference time for the communication between processes PO and PI. Given that the local 
time is reset at the start of the Clique Initialization mode, then: 


Tid.P0-P1.REF — Tid — AiD.begin 

(G.24) 

To determine Wi D , Deskew, we need the maximum error in the expected time of reception for the Initial 
Diagnosis protocol messages, Ai D ,pp,Rcvlabs-max- Based on the analysis presented in Appendix B for point- 
to-point communication: 

AlD,PP,RCvlabs-max — L( 1 + Po)(ftlD+IS + max(p P p,i , p,pp.h))J 

(G.25) 

Pp P 1 and u PPh are given in Appendix B. So, for the deskewing window: 


WiD.Deskew = 2AiD,PP.RCvlabs-max + 1 

(G.26) 

ID. Deskew, prc — AiD.PP.RCvlabs-max 

(G.27) 

^^ID.Deskew.post AiD.pp.RCvlabs-max 1 

(G.28) 


We expect the upper bound on the relative local-time skew during the execution of the protocol to be 
much larger than any minimum timing constraints associated with the process of communication. Based 
on this, we assume that the following condition holds for the communication between processes PO and 

PI. 


SlD.Polmin + Rpp < AiD.Pl.RCVWNDlmin + WiD.Deskew.pre (G.29) 

For this case: 

SlD.PO = AlD,Pl,RCVWNDlmin + W 1D .Deskew, pre " Rpp (G.30) 

And: 

Aid.PI.RCVWND — AiD.Pl.RCVWNDlmin (G.31) 

So: 


Tid,P0,SND — Tid,P 0-P1,REF + SiE),P0 

= Tid + Aid.P1 .RCVWNdI min + W ID 

, Deskew, pre Rpp 

And: 

Tid.PI.RCV.E = Tid.PO.SND + Rpp 

— Tid + AiD.Pl.RCVWNDlmin + WiD.Deskew.pre 


(G.32) 


(G.33) 
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G.2.5.2. Bound on the duration of the Initial Diagnosis protocol 


Let T ID P1 c denote the local-time at which the Computation Process outputs the result for process PI. 

Tid.PI.C = TlD,Pl,RCV,E + WiD.Deskew.post + Qd.PI (G.34) 

Let Tiixpi.knd denote the local-time at which the execution of the Initial Diagnosis protocol ends. 
Tid.pi.end — Tid.pi.c + Aid,pi,c-end (G.35) 

The duration of the execution of the Initial Diagnosis protocol is: 

Aid — Tid , PI, END “ Tid 

= AiD.Pl.RCVWNDlmin + Wid, D eskew + Cid.PI + Aid,P1,C-END (G.36) 

G.2.6. Initial Synchronization 

Let Tis denote the local time triggering the execution of the Initial Synchronization protocol. A| S hcgm 
denotes the delay from the end of Initial Diagnosis to the beginning of Initial Synchronization, measured 
in units of local clock ticks. The value of Ai Sbegin is determined by the implementation and assumed 
constant. 

Ais, begin = Tis - Ti D ,pi,end (G.37) 

G.2.6.1. Bound on the relative skew at the beginning of the Initial Synchronization protocol 

Let ttis.BEGiN denote the upper bound on the relative local-time skew at the beginning of the Initial 
Synchronization protocol, measured in nominal clock ticks. 

TtlS, BEGIN — ftciM.BEGIN + [( 1 + po) " 1/(1 + Po)](AlD,begin + A JD + Ai Sbegin ) (G.38) 

G.2.6.2. Communication between processes PO and PI 

This is discussed in Appendix C. There, 7ti S denotes the bound on the relative skew during the 
execution of the Initial Synchronization protocol. Thus: 

ftis = ttiD+is (G.39) 


G.2.6.3. Bound on the duration of the Initial Synchronization protocol 

5islmax denotes the upper bound on the real-time duration of the execution of the Initial Synchronization 
protocol measured from the earliest time at which a node begins executing the protocol to the latest time 
at which a node applies the synchronization reset. 5i SjSync l max , Ai S-P2jfth , A| S P3 H h , and A| S F4 H h are given by 
SsyncUx, A synCiP2i H,h , A synCiP3 , H ,h , and A synCjP4jH , h in the Appendix C with B P0 replaced by Bi S jP0 . 
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Sislmax - ^IS, BEGIN + max (AlS,P2,H,h > A^^HJi , A IS ,p4,h, h) (G.40) 

Let A is l max denotes the upper bound on the duration of the execution of the Initial Synchronization 
protocol measured in local clock ticks. We want the fastest count of A is l max ticks to be larger than 8isl max - 

Aislm ax /(1 po) — Sis I max (G.41) 

We choose the following value for A is l max . 

Aislmax = r (1 + Po) Sis I maxi (G.42) 

G.2.7. Bound on the relative skew during Initial Diagnosis and Initial Synchronization 

The bound on the relative local-time skew during Initial Diagnosis and Initial Synchronization is: 
ftlD+IS = tt is, BEGIN + [(1 + po) - 1/(1 + Po)]8lslmax (G.43) 

The following variables are defined in order to simplify the expressions presented below. 

Xis,H,h = max (Ais,P 2 ,H,h> A IS ,P3,H,h> Ai Sj P4,H,h) (G.44) 

(Jo = [(1 + Po) - 1/(1 + po)] (G.45) 

Then: 

7tlD+IS = TtlS, BEGIN + <^o(ltlS, BEGIN + X IS ,H,h) 

= O 0 Xis,H,h + ( 1 + C?o)ttlS, BEGIN 

= O 0 X|s.H,h + (1 + C?o)[ttciM,BEGIN + Oo( Aid, begin + Aid + Ais, begin)] 

= <J(>Xis,H,h + ( 1 + < 5 0 ) [JtciM.BEGIN + C?o(Aid , begin + A IS , begin)] + <T)( I + <7())A ID (G.46) 

The following inequality holds for W ID ,Deskew: 

WiD.Deskew ^ 2(1 + po)[ttlD+IS + m ax (PpP,l, g,pp,h)] + 1 (G.47) 

Applying this inequality to A ID , then: 

ttlD+IS — CtoXis.H.h + (1 + t?o) t^CIM.BEGIN + <X)(Aid , begin "t" Ais, begin)] 

+ C>o(l + <Jo) { AlD,Pl,RCVWNDlmin + Cid.P 1 + Aid,P1,C-END 

+ [2(1 + Po)OLd+is + max (Ppp,b ftpp.h)) + 1]} (G.48) 

Again, the definition of the following variable simplifies the presentation. 
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Y - 0()X ISjH ,h + (1 + C?o)[ftciM, BEGIN + ^oCAlD.begin + Ais.begin)] 


+ 0()(1 + ^(^(AlD.Pl.RCVWNDlmin + Qd,P1 + A ID P1 Q- end) (G.49) 

So: 

^id-us - Y + a 0 (l + cr 0 )[2(l + Po)(71id + is + max(p PP1 , M-pp.h)) + 1] 

TCid+is - {Y + a 0 (l + <Jo)[2(l + po)max((j, PPj i, [t PPj h) + 1] }/{ 1 - 2<7o(l + CJo)(l + po) } (G.50) 

We choose the right side of this expression as the value for 7t ID+Is . 

TCid+is - {Y + a 0 (l + CTo) [2(1 + po)max(p PPjl , [t PPj h) + 1]}/{1 - 2<7o(l + G o)(l + Po)} (G.51) 


G.3. Clique join 

The clique-join recovery cases include joining after a power-on enable trigger and rejoining triggered 
by the detection of a failure. A clique is present on the bus for these recovery cases, and thus the major 
mode path through the Clique Join mode is followed to reach the Clique Preservation mode. The most 
important element of clique-join recovery is the loading of the state information from the clique. 

The full state of the clique consists of the local time, the diagnostic state (i.e., suspicions, accusations, 
and convictions), and the PE communication schedule. All of these state variables are recomputed in 
each execution cycle. The local time is recomputed periodically by the Synchronization Preservation 
protocol. The suspicions are accumulated during a diagnostic cycle and then cleared after being 
processed to generate the suspicions-based accusations. The accusations are also accumulated during a 
diagnostic cycle and cleared when the convictions are updated. The convictions are recomputed at the 
end of every diagnostic cycle by the Collective Diagnosis protocol. The PE communication schedule is 
loaded anew in the Schedule Update mode immediately after each execution of the Collective Diagnosis 
protocol. 

This state-update pattern results in a straightforward state-acquisition process for a recovering node. 
Because the suspicions and accusations are cleared every diagnostic cycle, keeping up with a clique after 
synchronizing and loading the convictions is a matter of receiving and processing messages as the clique 
members do. The most critical aspect of the recovery process is achieving the proper diagnostic state 
before attempting to capture the time and convictions state variables. To ensure a successful state 
loading, the sets of eligible voters in Synchronization Acquisition and Collective Diagnosis Acquisition 
should have more untrustworthy nodes that untrustworthy ones. The probability of achieving this is 
limited by the ability of Local Diagnosis Acquisition to correctly diagnose the observed nodes. 

[Pike 05] presents a formal verification of synchronization sequence, including the protocols in the 
Local Diagnosis Acquisition and Synchronization Acquisition modes. 

The presentation for the clique-initialization recovery cases up to the completion of the Clique 
Detection mode applies for clique -join recovery. No further timing analysis is presented here. 
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